We screened a CRISPR interference library consisting of >9000 Saccharomyces cerevisiae strains where >98% of all essential and respiratory growth-essential genes were targeted with multiple gRNAs. The screen was performed using the high-throughput, high-resolution scan-o-matic platform (Zackrisson et al., 2016) link, where each strain is analyzed separately in order to generate and analyze high-resolution growth curves without the influence/competition from other strains.
In an ideal library screening (with strains coming from the same background) we should be able to observe a normally distributed wide phenotypic variability under a particular test condition to pick the best and the worst performers in the library. For this purpose, we need to identify a stressor concentration (in this case acetic acid), which should be severe enough to induce large phenotypic variability but at the same time most strains should manage to grow and give us a quantitative phenotype. Therefore, Plate 7 & 8 was pre-screened at different acetic acid concentrations (0, 50mM, 75mM, 100mM, and 150mM of acetic acid) to identify appropriate acetic acid concentration for the whole library screening. Unfortunately, the spatial control strain at this point was BY4741, which did not growth at 150mM (BY4741 was later replaced with CC23 i.e. one of the CRISPRi control strain with a gRNA non-homologous to Saccharomyces cerevisiae genome). Therefore, to compare our results we used the absolute generation time (without any normalization for spatial bias). We assumed that the phenotypic variability due to spatial bias will be very similar within the test plates and since we will only look at the phenotypic variability within the strains at this point, it should not severely influence the final conclusion of this titration round. The raw absolute data of plate 7 and 8 is available in the SOM_AA_TITRA folder within the RAW_DATA folder. The files are organized in the following specific format
phenotypes.Absolute.Atc7.5_aa[acetic acid concentration in mM]_p[plate number].csv
For example, the result of plate 7 in 50mM of acetic acid is available in the file phenotypes.Absolute.Atc7.5_aa50_p7.csv
For the ease of analysis we compiled the data in a .csv and the compiled data is available in the COMPILED_DATA folder.
Acetic acid titration data : 20210120_AA_titration_absolute_compiled.csv
AA_titration_data <- read.csv("COMPILED_DATA/20210120_AA_titration_absolute_compiled.csv", na.strings = "NoGrowth")
Install packages: Out of these ggplot2 and reshape will be frequently used later for data visualization
ggplot2
reshape)
ggridges
Prepare the data in the format requisite for ggplot2 package using reshape
AA_titration_data_reshape <- reshape(data=AA_titration_data, idvar="gRNA_name",
varying = colnames(AA_titration_data)[3:7],
v.name=c("Generation_time"),
new.row.names = 1:30000,
direction="long",
timevar = "Condition",
times = colnames(AA_titration_data)[3:7])
Figure 1: Density trace of absolute generation time of strains in plate 7 and 8 at different concentration of acetic acid
At 150mM we observed the largest phenotypic variability within the strains of plate 7 and 8. Therefore, 150mM was the selected acetic acid concentration to screen the entire library.
The phenotypic data generated in scan-o-matic screening in .csv format. We extract both the absolute and the normalized phenotypes.
The CRISPRi strains in the library were arrayed in 24 plates in 384 format. Each CRISPRi plate was subjected to two different condition (Basal and 150 mM of Acetic acid). Therefore, for each plate four different files are generated. All files generated in a single independent experimental round are stored in a single folder.
SOM_SCR_R001 : Raw data for round1
SOM_SCR_R002 : Raw data for round2
ABSOLUTE DATA
The Absolute dataset gives the extracted phenotypes without any spatial normalization
NORMALIZED DATA
The Normalized dataset is generated after removal of any spatial bias. This is in log2 scale and referred as Log Strain Coefficient (LSC) values
FILE NAMING
Each file is named with the plate identifier in such a way so that it can be easily called programmatically
Eg. Plate 1 absolute data in basal (Ctrl) condition have the following string
Ctrl1.phenotypes.Absolute
AND
Plate 1 Normalized data in acetic acid (aa) stress have the string
aa1.phenotypes.Normalized
At the end of this data import session, a single data.frame will be generated with the data of 24 plates. The whole dataset will be labeled with the strains attributes using the metadata key file (provided in the COMPILED_DATA folder). The data import below is shown for only Round2 dataset. Round1 can be generated modifying the folder location
METADATA KEY FILE : library_keyfile1536.csv
Metadata_CRISPRi <- read.csv("COMPILED_DATA/library_keyfile1536.csv", na.strings = "#N/A", stringsAsFactors = FALSE)
str(Metadata_CRISPRi)
## 'data.frame': 36864 obs. of 11 variables:
## $ SL_No : num 1 2 3 4 5 6 7 8 9 10 ...
## $ gRNA_name : chr "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
## $ Seq : chr "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
## $ SOURCEPLATEID : chr "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
## $ SOURCEDENSITY : chr "384A" "384A" "384A" "384A" ...
## $ SOURCECOLONYCOLUMN: int 1 1 2 2 3 3 4 4 5 5 ...
## $ SOURCECOLONYROW : chr "A" "A" "A" "A" ...
## $ border : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ GENE : chr "FBP26" "FBP26" "HMI1" "HMI1" ...
## $ Control.gRNA : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Location_1536 : chr "A1" "A2" "A3" "A4" ...
m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_Ctrl_Abs <- data.frame()
for(i in 1:24){
m <- paste0("Ctrl", i, ".phenotypes.Absolute")
file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
data_Ctrl_Abs <- rbind(data_Ctrl_Abs, temp_df)
}
str(data_Ctrl_Abs)
## 'data.frame': 36864 obs. of 18 variables:
## $ Plate : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Row : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Column : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Phenotypes.InitialValue : num 80116 83495 92467 97104 99622 ...
## $ Phenotypes.ExperimentBaseLine : num 81419 84504 94039 98641 101354 ...
## $ Phenotypes.ExperimentEndAverage : num 8615209 6710686 6037319 5554230 5234758 ...
## $ Phenotypes.ColonySize48h : num 6487316 5329111 4946774 4684166 4440201 ...
## $ Phenotypes.ChapmanRichardsParam2 : num 14.5 12.8 42.7 17.6 18.3 ...
## $ Phenotypes.ChapmanRichardsParam3 : num -2.74 -2.65 -2.58 -2.52 -2.47 ...
## $ Phenotypes.ChapmanRichardsParamXtra : num 16 16 16.2 16.3 16.3 ...
## $ Phenotypes.ChapmanRichardsParam1 : num 1.95 1.89 1.84 1.8 1.77 ...
## $ Phenotypes.ChapmanRichardsParam4 : num -31.33 -31.47 -3.93 -2.78 -2.48 ...
## $ Phenotypes.GenerationTimeStErrOfEstimate: num 0.012879 0.002174 0.000633 0.001667 0.000759 ...
## $ Phenotypes.ExperimentGrowthYield : num 8533790 6626182 5943280 5455589 5133404 ...
## $ Phenotypes.GenerationTime : num 2.53 2.48 2.51 2.56 2.53 ...
## $ Phenotypes.ExperimentPopulationDoublings: num 6.73 6.31 6 5.82 5.69 ...
## $ Phenotypes.ChapmanRichardsFit : num 0.999 0.999 0.999 0.999 0.999 ...
## $ Phenotypes.GenerationTimeWhen : num 6.14 4.1 4.1 3.76 3.76 ...
Several phenotypes are extracted. However, the most useful for this study will be,
Extract only this two column in the final data.frame
Rename the column names to prevent any ambiguity
data_Ctrl_Abs_Trim <- data_Ctrl_Abs[, 14:15]
colnames(data_Ctrl_Abs_Trim) <- c("CTRL_Y_ABS", "CTRL_GT_ABS")
str(data_Ctrl_Abs_Trim)
## 'data.frame': 36864 obs. of 2 variables:
## $ CTRL_Y_ABS : num 8533790 6626182 5943280 5455589 5133404 ...
## $ CTRL_GT_ABS: num 2.53 2.48 2.51 2.56 2.53 ...
Following the same strategy as above
m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_AA_Abs <- data.frame()
for(i in 1:24){
m <- paste0("aa", i, ".phenotypes.Absolute")
file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
data_AA_Abs <- rbind(data_AA_Abs, temp_df)
}
data_AA_Abs_Trim <- data_AA_Abs[, 14:15]
colnames(data_AA_Abs_Trim) <- c("AA_Y_ABS", "AA_GT_ABS")
m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_Ctrl_Norm <- data.frame()
for(i in 1:24){
m <- paste0("Ctrl", i, ".phenotypes.Normalized")
file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
data_Ctrl_Norm <- rbind(data_Ctrl_Norm, temp_df)
}
str(data_Ctrl_Norm)
## 'data.frame': 36864 obs. of 8 variables:
## $ Plate : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Row : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Column : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Phenotypes.ExperimentGrowthYield : num 0.755 0.39 0.233 0.484 0.396 ...
## $ Phenotypes.GenerationTime : num 0.0505 0.0204 0.0389 -0.0173 -0.0327 ...
## $ Phenotypes.ExperimentPopulationDoublings: num 0.1907 0.0991 0.0272 0.113 0.0818 ...
## $ Phenotypes.ExperimentBaseLine : num -0.0884 -0.0347 0.1195 0.0367 0.0758 ...
## $ Phenotypes.ColonySize48h : num 0.584 0.301 0.193 0.397 0.32 ...
The most useful for this study will be,
Extract only this two column
data_Ctrl_Norm_Trim <- data_Ctrl_Norm[, 4:5]
colnames(data_Ctrl_Norm_Trim) <- c("CTRL_Y_NORM", "CTRL_GT_NORM")
Same as above
m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_AA_Norm <- data.frame()
for(i in 1:24){
m <- paste0("aa", i, ".phenotypes.Normalized")
file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
data_AA_Norm <- rbind(data_AA_Norm, temp_df)
}
data_AA_Norm_Trim <- data_AA_Norm[, 4:5]
colnames(data_AA_Norm_Trim) <- c("AA_Y_NORM", "AA_GT_NORM")
Trimmed datasets are combined to obtain the final data.frame. The combined data frame is labeled as data from ROUND2
R <- rep("2nd_round", 36864)
Round_ID <- data.frame(R, stringsAsFactors = FALSE)
whole_data_R2 <- cbind(Metadata_CRISPRi,
Round_ID,
data_Ctrl_Abs_Trim,
data_AA_Abs_Trim,
data_Ctrl_Norm_Trim,
data_AA_Norm_Trim)
colnames(whole_data_R2)[12] <- "Round_ID"
str(whole_data_R2)
## 'data.frame': 36864 obs. of 20 variables:
## $ SL_No : num 1 2 3 4 5 6 7 8 9 10 ...
## $ gRNA_name : chr "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
## $ Seq : chr "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
## $ SOURCEPLATEID : chr "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
## $ SOURCEDENSITY : chr "384A" "384A" "384A" "384A" ...
## $ SOURCECOLONYCOLUMN: int 1 1 2 2 3 3 4 4 5 5 ...
## $ SOURCECOLONYROW : chr "A" "A" "A" "A" ...
## $ border : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ GENE : chr "FBP26" "FBP26" "HMI1" "HMI1" ...
## $ Control.gRNA : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Location_1536 : chr "A1" "A2" "A3" "A4" ...
## $ Round_ID : chr "2nd_round" "2nd_round" "2nd_round" "2nd_round" ...
## $ CTRL_Y_ABS : num 8533790 6626182 5943280 5455589 5133404 ...
## $ CTRL_GT_ABS : num 2.53 2.48 2.51 2.56 2.53 ...
## $ AA_Y_ABS : num 2090439 2241914 1861277 1920070 1957912 ...
## $ AA_GT_ABS : num 8.92 8.43 9.19 8.91 8.87 ...
## $ CTRL_Y_NORM : num 0.755 0.39 0.233 0.484 0.396 ...
## $ CTRL_GT_NORM : num 0.0505 0.0204 0.0389 -0.0173 -0.0327 ...
## $ AA_Y_NORM : num 0.373 0.474 0.205 0.386 0.415 ...
## $ AA_GT_NORM : num 0.019 -0.0614 0.0621 -0.2135 -0.219 ...
The results from Round1 is already compiled to a .csv file in COMPILED_DATA folder Results 1st Round : 20190903_CRISPRi_Screen_aa_1st_round.csv
Import the dataset and label as data from ROUND1
First_round <- read.csv("COMPILED_DATA/20190903_CRISPRi_Screen_aa_1st_round.csv",
na.strings = c("#N/A", "NoGrowth"),
stringsAsFactors = FALSE)
R <- rep("1st_round", 36864)
Round_ID <- data.frame(R, stringsAsFactors = FALSE)
whole_data_R1 <- cbind(Metadata_CRISPRi, Round_ID, First_round[, 12:19])
colnames(whole_data_R1)[12] <- "Round_ID"
str(whole_data_R1)
## 'data.frame': 36864 obs. of 20 variables:
## $ SL_No : num 1 2 3 4 5 6 7 8 9 10 ...
## $ gRNA_name : chr "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
## $ Seq : chr "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
## $ SOURCEPLATEID : chr "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
## $ SOURCEDENSITY : chr "384A" "384A" "384A" "384A" ...
## $ SOURCECOLONYCOLUMN: int 1 1 2 2 3 3 4 4 5 5 ...
## $ SOURCECOLONYROW : chr "A" "A" "A" "A" ...
## $ border : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ GENE : chr "FBP26" "FBP26" "HMI1" "HMI1" ...
## $ Control.gRNA : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Location_1536 : chr "A1" "A2" "A3" "A4" ...
## $ Round_ID : chr "1st_round" "1st_round" "1st_round" "1st_round" ...
## $ CTRL_Y_ABS : num 7046465 5380541 4841666 4517281 4209174 ...
## $ CTRL_GT_ABS : num 3.15 3.64 3.18 3.04 3.13 ...
## $ AA_Y_ABS : num 833657 844666 708042 734725 717147 ...
## $ AA_GT_ABS : num 13.1 13.6 14.5 14.7 13.6 ...
## $ CTRL_Y_NORM : num 0.899 0.51 0.358 0.565 0.463 ...
## $ CTRL_GT_NORM : num -0.10076 0.10855 -0.08699 0.00504 0.0462 ...
## $ AA_Y_NORM : num -0.362 -0.343 -0.598 -0.642 -0.677 ...
## $ AA_GT_NORM : num 0.273 0.331 0.423 0.425 0.313 ...
whole_data_CRISPRi_aa <- rbind(whole_data_R1, whole_data_R2)
In this study most of the downstream analysis was performed using the phenotype Generation_time(GT)
In this session, downstream data processing and statistical analysis of SCAN-O-MATIC raw output will be performed
LPI of strain is the difference of its normalized Generation_Time(GT) / Yield(Y) (LSC, see IMPORT SCAN-O-MATIC RAW DATA) on acetic acid stress plate to the basal condition. It gives a RELATIVE estimate of how a strain performed under acetic acid stress relative to the basal condition.
The RELATIVE GENERATION TIME i.e. LPI_GT = LSC_GT_Acetic_Acid - LSC_GT_Basal
whole_data_CRISPRi_aa[, 21] <- whole_data_CRISPRi_aa[, 19]-whole_data_CRISPRi_aa[, 17]
whole_data_CRISPRi_aa[, 22] <- whole_data_CRISPRi_aa[, 20]-whole_data_CRISPRi_aa[, 18]
colnames(whole_data_CRISPRi_aa)[21] <- "LPI_Y"
colnames(whole_data_CRISPRi_aa)[22] <- "LPI_GT"
Plate-wise batch correction was conducted by subtracting the median of LSC GT values of all the individual colonies on a plate from the individual LSC GT values of the colonies growing on that plate.
i.e. if strainX is growing in Basal condition on plate Z, the corrected LSC_GT value for strainX in the Basal condition is the following;
plate_ID <- as.character(unique(whole_data_CRISPRi_aa$SOURCEPLATEID))
whole_data_CRISPRi_aa_corrected <- whole_data_CRISPRi_aa
med_LogLSCctrl_RND1_GT <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND1_GT <- vector(mode = "integer", length = 0)
med_LogLSCctrl_RND2_GT <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND2_GT <- vector(mode = "integer", length = 0)
med_LogLSCctrl_RND1_Y <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND1_Y <- vector(mode = "integer", length = 0)
med_LogLSCctrl_RND2_Y <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND2_Y <- vector(mode = "integer", length = 0)
for(i in 1:24){
med_LogLSCctrl_RND1_GT[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
& !is.na(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM)
& whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])
med_LogLSCaa_RND1_GT[i] <- median(whole_data_CRISPRi_aa_corrected$AA_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
& !is.na(whole_data_CRISPRi_aa_corrected$AA_GT_NORM)
& whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])
med_LogLSCctrl_RND2_GT[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
& !is.na(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM)
& whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])
med_LogLSCaa_RND2_GT[i] <- median(whole_data_CRISPRi_aa_corrected$AA_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
& !is.na(whole_data_CRISPRi_aa_corrected$AA_GT_NORM)
& whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])
med_LogLSCctrl_RND1_Y[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
& !is.na(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM)
& whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])
med_LogLSCaa_RND1_Y[i] <- median(whole_data_CRISPRi_aa_corrected$AA_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
& !is.na(whole_data_CRISPRi_aa_corrected$AA_Y_NORM)
& whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])
med_LogLSCctrl_RND2_Y[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
& !is.na(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM)
& whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])
med_LogLSCaa_RND2_Y[i] <- median(whole_data_CRISPRi_aa_corrected$AA_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
& !is.na(whole_data_CRISPRi_aa_corrected$AA_Y_NORM)
& whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])
whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 23] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 17] - med_LogLSCctrl_RND1_Y[i]
whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 24] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 18] - med_LogLSCctrl_RND1_GT[i]
whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 23] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 17] - med_LogLSCctrl_RND2_Y[i]
whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 24] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 18] - med_LogLSCctrl_RND2_GT[i]
whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 25] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 19] - med_LogLSCaa_RND1_Y[i]
whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 26] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 20] - med_LogLSCaa_RND1_GT[i]
whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 25] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 19] - med_LogLSCaa_RND2_Y[i]
whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 26] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 20] - med_LogLSCaa_RND2_GT[i]
}
Estimate the corrected LPI values (see ESTIMATE THE LOG PHENOTYPIC INDEX (LPI) VALUES) based on the corrected LSC values
i.e. LPI_GTcorrected = LSC_GT_Acetic_Acidcorrected - LSC_GT_Basalcorrected
Estimate the corrected LPI_Y
whole_data_CRISPRi_aa_corrected[, 27] <- whole_data_CRISPRi_aa_corrected[, 25] - whole_data_CRISPRi_aa_corrected[, 23]
Estimate the corrected LPI_GT
whole_data_CRISPRi_aa_corrected[, 28] <- whole_data_CRISPRi_aa_corrected[, 26] - whole_data_CRISPRi_aa_corrected[, 24]
colnm <- colnames(whole_data_CRISPRi_aa)[17:22]
colnm <- paste0(colnm, "_CR")
colnames(whole_data_CRISPRi_aa_corrected)[23:28] <- colnm
str(whole_data_CRISPRi_aa_corrected)
## 'data.frame': 73728 obs. of 28 variables:
## $ SL_No : num 1 2 3 4 5 6 7 8 9 10 ...
## $ gRNA_name : chr "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
## $ Seq : chr "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
## $ SOURCEPLATEID : chr "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
## $ SOURCEDENSITY : chr "384A" "384A" "384A" "384A" ...
## $ SOURCECOLONYCOLUMN: int 1 1 2 2 3 3 4 4 5 5 ...
## $ SOURCECOLONYROW : chr "A" "A" "A" "A" ...
## $ border : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ GENE : chr "FBP26" "FBP26" "HMI1" "HMI1" ...
## $ Control.gRNA : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Location_1536 : chr "A1" "A2" "A3" "A4" ...
## $ Round_ID : chr "1st_round" "1st_round" "1st_round" "1st_round" ...
## $ CTRL_Y_ABS : num 7046465 5380541 4841666 4517281 4209174 ...
## $ CTRL_GT_ABS : num 3.15 3.64 3.18 3.04 3.13 ...
## $ AA_Y_ABS : num 833657 844666 708042 734725 717147 ...
## $ AA_GT_ABS : num 13.1 13.6 14.5 14.7 13.6 ...
## $ CTRL_Y_NORM : num 0.899 0.51 0.358 0.565 0.463 ...
## $ CTRL_GT_NORM : num -0.10076 0.10855 -0.08699 0.00504 0.0462 ...
## $ AA_Y_NORM : num -0.362 -0.343 -0.598 -0.642 -0.677 ...
## $ AA_GT_NORM : num 0.273 0.331 0.423 0.425 0.313 ...
## $ LPI_Y : num -1.262 -0.854 -0.956 -1.207 -1.14 ...
## $ LPI_GT : num 0.374 0.223 0.51 0.42 0.266 ...
## $ CTRL_Y_NORM_CR : num 0.918 0.529 0.376 0.583 0.482 ...
## $ CTRL_GT_NORM_CR : num -0.0877 0.1216 -0.074 0.0181 0.0592 ...
## $ AA_Y_NORM_CR : num -0.0736 -0.0547 -0.3092 -0.3534 -0.3883 ...
## $ AA_GT_NORM_CR : num 0.19 0.248 0.34 0.342 0.229 ...
## $ LPI_Y_CR : num -0.991 -0.583 -0.686 -0.937 -0.87 ...
## $ LPI_GT_CR : num 0.278 0.127 0.414 0.323 0.17 ...
whole_data_CRISPRi_aa_2 <- whole_data_CRISPRi_aa_corrected[, c(1:16, 23:28)]
colnames(whole_data_CRISPRi_aa_2)[17:22] <- colnames(whole_data_CRISPRi_aa)[17:22]
str(whole_data_CRISPRi_aa_2)
## 'data.frame': 73728 obs. of 22 variables:
## $ SL_No : num 1 2 3 4 5 6 7 8 9 10 ...
## $ gRNA_name : chr "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
## $ Seq : chr "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
## $ SOURCEPLATEID : chr "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
## $ SOURCEDENSITY : chr "384A" "384A" "384A" "384A" ...
## $ SOURCECOLONYCOLUMN: int 1 1 2 2 3 3 4 4 5 5 ...
## $ SOURCECOLONYROW : chr "A" "A" "A" "A" ...
## $ border : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ GENE : chr "FBP26" "FBP26" "HMI1" "HMI1" ...
## $ Control.gRNA : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Location_1536 : chr "A1" "A2" "A3" "A4" ...
## $ Round_ID : chr "1st_round" "1st_round" "1st_round" "1st_round" ...
## $ CTRL_Y_ABS : num 7046465 5380541 4841666 4517281 4209174 ...
## $ CTRL_GT_ABS : num 3.15 3.64 3.18 3.04 3.13 ...
## $ AA_Y_ABS : num 833657 844666 708042 734725 717147 ...
## $ AA_GT_ABS : num 13.1 13.6 14.5 14.7 13.6 ...
## $ CTRL_Y_NORM : num 0.918 0.529 0.376 0.583 0.482 ...
## $ CTRL_GT_NORM : num -0.0877 0.1216 -0.074 0.0181 0.0592 ...
## $ AA_Y_NORM : num -0.0736 -0.0547 -0.3092 -0.3534 -0.3883 ...
## $ AA_GT_NORM : num 0.19 0.248 0.34 0.342 0.229 ...
## $ LPI_Y : num -0.991 -0.583 -0.686 -0.937 -0.87 ...
## $ LPI_GT : num 0.278 0.127 0.414 0.323 0.17 ...
Construct a new data structure where data from each strain (have a unique guide-RNA) is in a separate row and the replicates from first and second round are side by side. Add also the mean, median and standard deviation statistics for each phenotype
Data_CRISPRi_aa <- subset(whole_data_CRISPRi_aa_2, whole_data_CRISPRi_aa_2$gRNA_name!="SP_Ctrl_CC23")
df_unique_sgRNA <- data.frame(table(Data_CRISPRi_aa$gRNA_name))
R1<-vector(mode = "integer", length = 0)
R2<-vector(mode = "integer", length = 0)
test2<-data.frame()
n<-nrow(df_unique_sgRNA)
for(i in 1:n){
R1 <- which(Data_CRISPRi_aa$gRNA_name==df_unique_sgRNA$Var1[i] & Data_CRISPRi_aa$Round_ID=="1st_round")
R2 <- which(Data_CRISPRi_aa$gRNA_name==df_unique_sgRNA$Var1[i] & Data_CRISPRi_aa$Round_ID=="2nd_round")
test1 <- Data_CRISPRi_aa[c(R1, R2), ]
test2[i, c(1:8)]<-test1[1, c(2:4, 6:7, 9:11)]
test2[i, c(9:14)] <- test1$CTRL_GT_NORM
test2[i, 15] <- mean(test1$CTRL_GT_NORM[1:3])
test2[i, 16] <- mean(test1$CTRL_GT_NORM[4:6])
test2[i, 17] <- sd(test1$CTRL_GT_NORM[1:3])
test2[i, 18] <- sd(test1$CTRL_GT_NORM[4:6])
test2[i, 19] <- mean(test1$CTRL_GT_NORM[1:6])
test2[i, 20] <- median(test1$CTRL_GT_NORM[1:6])
test2[i, 21] <- sd(test1$CTRL_GT_NORM[1:6])
test2[i, c(22:27)] <- test1$AA_GT_NORM
test2[i, 28] <- mean(test1$AA_GT_NORM[1:3])
test2[i, 29] <- mean(test1$AA_GT_NORM[4:6])
test2[i, 30] <- sd(test1$AA_GT_NORM[1:3])
test2[i, 31] <- sd(test1$AA_GT_NORM[4:6])
test2[i, 32] <- mean(test1$AA_GT_NORM[1:6])
test2[i, 33] <- median(test1$AA_GT_NORM[1:6])
test2[i, 34] <- sd(test1$AA_GT_NORM[1:6])
test2[i, c(35:40)] <- test1$LPI_GT
test2[i, 41] <- mean(test1$LPI_GT[1:3])
test2[i, 42] <- mean(test1$LPI_GT[4:6])
test2[i, 43] <- sd(test1$LPI_GT[1:3])
test2[i, 44] <- sd(test1$LPI_GT[4:6])
test2[i, 45] <- mean(test1$LPI_GT[1:6])
test2[i, 46] <- median(test1$LPI_GT[1:6])
test2[i, 47] <- sd(test1$LPI_GT[1:6])
test2[i, c(48:53)] <- test1$CTRL_Y_NORM
test2[i, 54] <- mean(test1$CTRL_Y_NORM[1:3])
test2[i, 55] <- mean(test1$CTRL_Y_NORM[4:6])
test2[i, 56] <- sd(test1$CTRL_Y_NORM[1:3])
test2[i, 57] <- sd(test1$CTRL_Y_NORM[4:6])
test2[i, 58] <- mean(test1$CTRL_Y_NORM[1:6])
test2[i, 59] <- median(test1$CTRL_Y_NORM[1:6])
test2[i, 60] <- sd(test1$CTRL_Y_NORM[1:6])
test2[i, c(61:66)] <- test1$AA_Y_NORM
test2[i, 67] <- mean(test1$AA_Y_NORM[1:3])
test2[i, 68] <- mean(test1$AA_Y_NORM[4:6])
test2[i, 69] <- sd(test1$AA_Y_NORM[1:3])
test2[i, 70] <- sd(test1$AA_Y_NORM[4:6])
test2[i, 71] <- mean(test1$AA_Y_NORM[1:6])
test2[i, 72] <- median(test1$AA_Y_NORM[1:6])
test2[i, 73] <- sd(test1$AA_Y_NORM[1:6])
test2[i, c(74:79)] <- test1$LPI_Y
test2[i, 80] <- mean(test1$LPI_Y[1:3])
test2[i, 81] <- mean(test1$LPI_Y[4:6])
test2[i, 82] <- sd(test1$LPI_Y[1:3])
test2[i, 83] <- sd(test1$LPI_Y[4:6])
test2[i, 84] <- mean(test1$LPI_Y[1:6])
test2[i, 85] <- median(test1$LPI_Y[1:6])
test2[i, 86] <- sd(test1$LPI_Y[1:6])
}
Column names are already stored in a text times available in the COMPILED_DATA folder. Then store the data.frame under a new name.
column_names <- read.table("COMPILED_DATA/Column_names.txt", header = FALSE, sep = "\t", as.is = TRUE)
colnames(test2) <- column_names$V1
Analysis_CRISPRi_aa_Complete <- test2
str(Analysis_CRISPRi_aa_Complete)
## 'data.frame': 9078 obs. of 86 variables:
## $ gRNA_name : chr "AAR2-NRg-3" "AAR2-NRg-4" "AAR2-TRg-15" "AAR2-TRg-16" ...
## $ Seq : chr "CCAGCGATAAGGAGGATCTT" "TGTGTCCTTTCTTCATCTCT" "AAAAGGAAAAAGTAATTAGG" "GTGAAAAGGAAAAAGTAATT" ...
## $ SOURCEPLATEID : chr "R2877.H.023" "R2877.H.024" "R2877.H.023" "R2877.H.023" ...
## $ SOURCECOLONYCOLUMN : int 5 21 22 20 21 18 6 7 9 8 ...
## $ SOURCECOLONYROW : chr "O" "L" "P" "N" ...
## $ GENE : chr "AAR2" "AAR2" "AAR2" "AAR2" ...
## $ Control.gRNA : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Location_1536 : chr "AC9" "W41" "AE43" "AA39" ...
## $ CTRL_GT_RND1_R1 : num 0.00397 0.00586 0.10456 -0.01142 0.0174 ...
## $ CTRL_GT_RND1_R2 : num -0.0086 0.00166 0.07118 0.00256 0.04094 ...
## $ CTRL_GT_RND1_R3 : num -0.000625 -0.043918 -0.009614 0.007969 -0.014018 ...
## $ CTRL_GT_RND2_R1 : num 0.0431 -0.0336 0.0292 0.0381 0.0742 ...
## $ CTRL_GT_RND2_R2 : num -0.0165 0.00744 0.01433 0.06599 -0.02973 ...
## $ CTRL_GT_RND2_R3 : num 0.03266 -0.04468 -0.04455 0.01994 -0.00924 ...
## $ CTRL_GT_RND1_MEAN : num -0.00175 -0.01213 0.05538 -0.0003 0.01477 ...
## $ CTRL_GT_RND2_MEAN : num 0.019756 -0.023599 -0.000341 0.041342 0.011747 ...
## $ CTRL_GT_RND1_SD : num 0.00636 0.02761 0.05871 0.01001 0.02757 ...
## $ CTRL_GT_RND2_SD : num 0.0318 0.0275 0.039 0.0232 0.0551 ...
## $ CTRL_GT_RND1_2_MEAN: num 0.009 -0.0179 0.0275 0.0205 0.0133 ...
## $ CTRL_GT_RND1_2_MED : num 0.00167 -0.01595 0.02176 0.01395 0.00408 ...
## $ CTRL_GT_RND1_2_SD : num 0.0237 0.0254 0.054 0.0278 0.039 ...
## $ AA_GT_RND1_R1 : num -0.0239 0.06 -0.0178 -0.1839 0.5993 ...
## $ AA_GT_RND1_R2 : num -0.0311 0.0763 -0.0145 -0.1293 0.2619 ...
## $ AA_GT_RND1_R3 : num 0.0315 -0.0191 -0.0778 -0.0514 0.3276 ...
## $ AA_GT_RND2_R1 : num 0.0142 -0.1375 0.111 0.0105 0.2021 ...
## $ AA_GT_RND2_R2 : num 0.0383 -0.2266 0.0588 0.0966 0.1831 ...
## $ AA_GT_RND2_R3 : num 0.0265 -0.2298 0.0789 0.0787 0.0114 ...
## $ AA_GT_RND1_MEAN : num -0.00781 0.03905 -0.03671 -0.12151 0.39624 ...
## $ AA_GT_RND2_MEAN : num 0.0263 -0.198 0.0829 0.0619 0.1322 ...
## $ AA_GT_RND1_SD : num 0.0343 0.051 0.0356 0.0666 0.1789 ...
## $ AA_GT_RND2_SD : num 0.0121 0.0524 0.0263 0.0455 0.105 ...
## $ AA_GT_RND1_2_MEAN : num 0.00925 -0.07945 0.02308 -0.02979 0.2642 ...
## $ AA_GT_RND1_2_MED : num 0.0203 -0.0783 0.0221 -0.0205 0.232 ...
## $ AA_GT_RND1_2_SD : num 0.0296 0.1378 0.0712 0.1127 0.1953 ...
## $ LPI_GT_RND1_R1 : num -0.0278 0.0541 -0.1224 -0.1724 0.5819 ...
## $ LPI_GT_RND1_R2 : num -0.0225 0.0746 -0.0857 -0.1318 0.2209 ...
## $ LPI_GT_RND1_R3 : num 0.0322 0.0248 -0.0682 -0.0594 0.3416 ...
## $ LPI_GT_RND2_R1 : num -0.0289 -0.1039 0.0818 -0.0276 0.1279 ...
## $ LPI_GT_RND2_R2 : num 0.0548 -0.2341 0.0444 0.0306 0.2128 ...
## $ LPI_GT_RND2_R3 : num -0.0062 -0.1851 0.1234 0.0588 0.0206 ...
## $ LPI_GT_RND1_MEAN : num -0.00605 0.05119 -0.09208 -0.12121 0.38147 ...
## $ LPI_GT_RND2_MEAN : num 0.00656 -0.17435 0.08321 0.02058 0.12042 ...
## $ LPI_GT_RND1_SD : num 0.0332 0.025 0.0276 0.0573 0.1837 ...
## $ LPI_GT_RND2_SD : num 0.0433 0.0657 0.0395 0.0441 0.0963 ...
## $ LPI_GT_RND1_2_MEAN : num 0.000251 -0.061582 -0.004437 -0.050313 0.250944 ...
## $ LPI_GT_RND1_2_MED : num -0.0143 -0.0396 -0.0119 -0.0435 0.2169 ...
## $ LPI_GT_RND1_2_SD : num 0.0352 0.1313 0.1007 0.0901 0.1941 ...
## $ CTRL_Y_RND1_R1 : num 0.055 0.0573 0.0189 0.0779 -0.1151 ...
## $ CTRL_Y_RND1_R2 : num 0.0131 0.0472 0.0129 0.0562 -0.0723 ...
## $ CTRL_Y_RND1_R3 : num 0.0306 0.0109 -0.0236 0.1208 -0.0171 ...
## $ CTRL_Y_RND2_R1 : num 0.0399 0.0113 -0.1422 0.0605 -0.0676 ...
## $ CTRL_Y_RND2_R2 : num 0.02531 0.0066 -0.17285 -0.00976 -0.08442 ...
## $ CTRL_Y_RND2_R3 : num -0.0528 0.0089 0.00598 0.06854 -0.08083 ...
## $ CTRL_Y_RND1_MEAN : num 0.03288 0.03847 0.00276 0.08497 -0.06817 ...
## $ CTRL_Y_RND2_MEAN : num 0.00414 0.00893 -0.10301 0.03977 -0.07762 ...
## $ CTRL_Y_RND1_SD : num 0.0211 0.0244 0.023 0.0329 0.0491 ...
## $ CTRL_Y_RND2_SD : num 0.04985 0.00234 0.09563 0.04308 0.00885 ...
## $ CTRL_Y_RND1_2_MEAN : num 0.0185 0.0237 -0.0501 0.0624 -0.0729 ...
## $ CTRL_Y_RND1_2_MED : num 0.028 0.0111 -0.0088 0.0645 -0.0766 ...
## $ CTRL_Y_RND1_2_SD : num 0.0377 0.0224 0.085 0.0423 0.032 ...
## $ AA_Y_RND1_R1 : num 0.0672 0.0106 0.1785 0.272 -2.1 ...
## $ AA_Y_RND1_R2 : num -0.3832 0.0102 0.1196 0.232 -1.1873 ...
## $ AA_Y_RND1_R3 : num -0.1599 -0.0465 -0.0381 0.1036 -1.0143 ...
## $ AA_Y_RND2_R1 : num 0.0503 0.3083 -0.2434 0.0968 -0.4223 ...
## $ AA_Y_RND2_R2 : num -0.0505 0.4391 -0.2526 -0.094 -0.304 ...
## $ AA_Y_RND2_R3 : num -0.00342 0.52795 -0.20049 -0.02924 -0.26528 ...
## $ AA_Y_RND1_MEAN : num -0.15864 -0.00857 0.08665 0.20251 -1.43385 ...
## $ AA_Y_RND2_MEAN : num -0.00121 0.42511 -0.23216 -0.00879 -0.33052 ...
## $ AA_Y_RND1_SD : num 0.2252 0.0329 0.112 0.088 0.5834 ...
## $ AA_Y_RND2_SD : num 0.0504 0.1105 0.0278 0.097 0.0818 ...
## $ AA_Y_RND1_2_MEAN : num -0.0799 0.2083 -0.0728 0.0969 -0.8822 ...
## $ AA_Y_RND1_2_MED : num -0.027 0.159 -0.119 0.1 -0.718 ...
## $ AA_Y_RND1_2_SD : num 0.17 0.248 0.189 0.142 0.71 ...
## $ LPI_Y_RND1_R1 : num 0.0122 -0.0467 0.1595 0.1942 -1.9849 ...
## $ LPI_Y_RND1_R2 : num -0.3963 -0.0369 0.1066 0.1757 -1.1149 ...
## $ LPI_Y_RND1_R3 : num -0.1905 -0.0575 -0.0145 -0.0172 -0.9972 ...
## $ LPI_Y_RND2_R1 : num 0.0104 0.297 -0.1012 0.0363 -0.3547 ...
## $ LPI_Y_RND2_R2 : num -0.0758 0.4325 -0.0797 -0.0842 -0.2196 ...
## $ LPI_Y_RND2_R3 : num 0.0494 0.519 -0.2065 -0.0978 -0.1844 ...
## $ LPI_Y_RND1_MEAN : num -0.1915 -0.047 0.0839 0.1175 -1.3657 ...
## $ LPI_Y_RND2_MEAN : num -0.00536 0.41618 -0.12915 -0.04856 -0.2529 ...
## $ LPI_Y_RND1_SD : num 0.2042 0.0103 0.0892 0.1171 0.5395 ...
## $ LPI_Y_RND2_SD : num 0.0641 0.1119 0.0678 0.0738 0.0899 ...
## $ LPI_Y_RND1_2_MEAN : num -0.0984 0.1846 -0.0226 0.0345 -0.8093 ...
## $ LPI_Y_RND1_2_MED : num -0.03273 0.13006 -0.04712 0.00953 -0.67593 ...
## $ LPI_Y_RND1_2_SD : num 0.169 0.263 0.137 0.126 0.701 ...
Multiple statistical method was applied to identify the best fit statistical model for this dataset. We start with the complete dataset and give it a new name to avoid distorting the original dataset.
Analysis_Final <- Analysis_CRISPRi_aa_Complete
For METHOD 1, We hypothesized that the difference between the mean(µ) phenotypic performance of a specific CRISPRi strain (StrainX) in the two independent experimental rounds (n=2) to the mean phenotypic performance of all the CRISPRi strains that falls within the interquartile range (IQR) of the complete dataset would be zero, and any difference within the IQR to be just by chance.
Null Hypothesis : µ(µLPI_GT_StrainX_Round1, µLPI_GT_StrainX_Round2)- µ(InterquartileRange_LPI_GT) = 0
In this method, we estimate the Mean / standard deviation (SD) of the LPI GT of Round 1 and Round 2 separately for each strain. When one/two of the three replicates of a strain in a round returned missing value (i.e. NA), then the mean / SD of LPI GT for that round is calculated by taking average of the non NA replicates. Therefore, excluding the missing values the mean and SD statistics were recalculated. We implemented a if else decision tree for this
for(i in 1:nrow(Analysis_Final)){
x1 <- as.numeric(Analysis_Final[i, 9:11][which(!is.na(Analysis_Final[i, 9:11]))])
x2 <- as.numeric(Analysis_Final[i, 12:14][which(!is.na(Analysis_Final[i, 12:14]))])
if(length(x1)==0){
Analysis_Final$CTRL_GT_RND1_MEAN[i] <- NA
} else{
Analysis_Final$CTRL_GT_RND1_MEAN[i] <- as.numeric(mean(x1))
}
if(length(x2)==0){
Analysis_Final$CTRL_GT_RND2_MEAN[i] <- NA
} else{
Analysis_Final$CTRL_GT_RND2_MEAN[i] <- as.numeric(mean(x2))
}
if(sum(is.na(c(Analysis_Final$CTRL_GT_RND1_MEAN[i], Analysis_Final$CTRL_GT_RND2_MEAN[i])))==0){
Analysis_Final$CTRL_GT_RND1_2_MEAN[i] <- as.numeric(mean(c(Analysis_Final$CTRL_GT_RND1_MEAN[i], Analysis_Final$CTRL_GT_RND2_MEAN[i])))
Analysis_Final[i, 87] <- as.numeric(sd(c(Analysis_Final$CTRL_GT_RND1_MEAN[i], Analysis_Final$CTRL_GT_RND2_MEAN[i])))
} else{
Analysis_Final$CTRL_GT_RND1_2_MEAN[i] <- NA
Analysis_Final[i, 87] <- NA
}
}
colnames(Analysis_Final)[87] <- "CTRL_GT_MEAN_RND1_2_SD"
for(i in 1:nrow(Analysis_Final)){
x1 <- as.numeric(Analysis_Final[i, 22:24][which(!is.na(Analysis_Final[i, 22:24]))])
x2 <- as.numeric(Analysis_Final[i, 25:27][which(!is.na(Analysis_Final[i, 25:27]))])
if(length(x1)==0){
Analysis_Final$AA_GT_RND1_MEAN[i] <- NA
} else{
Analysis_Final$AA_GT_RND1_MEAN[i] <- as.numeric(mean(x1))
}
if(length(x2)==0){
Analysis_Final$AA_GT_RND2_MEAN[i] <- NA
} else{
Analysis_Final$AA_GT_RND2_MEAN[i] <- as.numeric(mean(x2))
}
if(sum(is.na(c(Analysis_Final$AA_GT_RND1_MEAN[i], Analysis_Final$AA_GT_RND2_MEAN[i])))==0){
Analysis_Final$AA_GT_RND1_2_MEAN[i] <- as.numeric(mean(c(Analysis_Final$AA_GT_RND1_MEAN[i], Analysis_Final$AA_GT_RND2_MEAN[i])))
Analysis_Final[i, 88] <- as.numeric(sd(c(Analysis_Final$AA_GT_RND1_MEAN[i], Analysis_Final$AA_GT_RND2_MEAN[i])))
} else{
Analysis_Final$AA_GT_RND1_2_MEAN[i] <- NA
Analysis_Final[i, 88] <- NA
}
}
colnames(Analysis_Final)[88] <- "AA_GT_MEAN_RND1_2_SD"
for(i in 1:nrow(Analysis_Final)){
x1 <- as.numeric(Analysis_Final[i, 35:37][which(!is.na(Analysis_Final[i, 35:37]))])
x2 <- as.numeric(Analysis_Final[i, 38:40][which(!is.na(Analysis_Final[i, 38:40]))])
if(length(x1)==0){
Analysis_Final$LPI_GT_RND1_MEAN[i] <- NA
} else{
Analysis_Final$LPI_GT_RND1_MEAN[i] <- as.numeric(mean(x1))
}
if(length(x2)==0){
Analysis_Final$LPI_GT_RND2_MEAN[i] <- NA
} else{
Analysis_Final$LPI_GT_RND2_MEAN[i] <- as.numeric(mean(x2))
}
if(sum(is.na(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))==0){
Analysis_Final$LPI_GT_RND1_2_MEAN[i] <- as.numeric(mean(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))
Analysis_Final[i, 89] <- as.numeric(sd(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))
} else{
Analysis_Final$LPI_GT_RND1_2_MEAN[i] <- NA
Analysis_Final[i, 89] <- NA
}
}
colnames(Analysis_Final)[89] <- "LPI_GT_MEAN_RND1_2_SD"
BOX PLOT - MEAN RELATIVE GENERATION TIME (LPI GT)
Figure 2: Boxplot of mean relative generation time (LPI GT) for all strains in the library
Display Box-plot statistics
box_stat_LPI_GT_R1_2_mean$stats
## [,1]
## [1,] -0.16933911
## [2,] -0.02428792
## [3,] 0.02084505
## [4,] 0.07255828
## [5,] 0.21771505
Therefore, extraction of the data points within IQR
Intermediate_50 <- Analysis_Final$LPI_GT_RND1_2_MEAN[which(Analysis_Final$LPI_GT_RND1_2_MEAN>=-0.02428792
&Analysis_Final$LPI_GT_RND1_2_MEAN<=0.07255828)]
summary(Intermediate_50)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0242879 -0.0009547 0.0208382 0.0219831 0.0445144 0.0725032
P-value is estimated by Welch two sample two-sided t-test (an adaptation of Student’s t-test)
for(i in 1:nrow(Analysis_Final)){
if(sum(is.na(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))==0){
P_value <- t.test(Intermediate_50, c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i]))
Analysis_Final[i, 90] <- P_value$p.value
} else{
Analysis_Final[i, 90] <- NA
}
}
colnames(Analysis_Final)[90] <- "P_value_M1"
P-value adjustment by BENJAMINI-HOCHBERG False Discovery Rate (FDR) method
Analysis_Final[which(!is.na(Analysis_Final$P_value_M1)), 91] <- p.adjust(Analysis_Final$P_value_M1[which(!is.na(Analysis_Final$P_value_M1))],
method = "BH",
n = length(Analysis_Final$P_value_M1[which(!is.na(Analysis_Final$P_value_M1))]))
colnames(Analysis_Final)[91] <- "P.adjusted_M1"
NUMBER OF SIGNIFICANT STRAINS
length(Analysis_Final$P_value_M1[which(Analysis_Final$P_value_M1<=0.05)])
## [1] 434
length(Analysis_Final$P.adjusted_M1[which(Analysis_Final$P.adjusted_M1<=0.05)])
## [1] 66
length(Analysis_Final$P_value_M1[which(Analysis_Final$P_value_M1<=0.1)])
## [1] 842
length(Analysis_Final$P.adjusted_M1[which(Analysis_Final$P.adjusted_M1<=0.1)])
## [1] 71
P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS
Figure 3: P-value diagnostic by histogram Method 1
Method 1 was too rigid as n=2. Even the smallest standard deviation between round1 and round2 is making an observation insignificant. This method puts the whole weightage on the variability between round1 and round2, not on the deviation from the mean of intermediate 50%. Therefore statistical method 1 was discarded after careful evaluation.
For METHOD 2, We hypothesized that the difference between the mean(µ) phenotypic performance (LPI GT) of a specific CRISPRi strain (StrainX) in a independent experimental round (each has three technical replicates, i.e. n=3) to the mean phenotypic performance of all the replicates of the CRISPRi control strains (with gRNA targeting no genetic locus in S. cerevisiae) in that respective screening round would be zero, and any difference within the CRISPRi control strain phenotypic performance range (LPI GT range) to be just by chance.
Null Hypothesis : µStrainX(LPI_GTReplica1, LPI_GTReplica2, LPI_GTReplica3)- µCRISPRi_Control_Strains(LPI_GT) = 0
In this method P-values for each strain were estimated for each round and only strain that showed significant performance in both round were considered for further analysis
First we clone the dataset in a new name to avoid any distortion down the line
Analysis_Final_2 <- Analysis_Final
str(Analysis_Final_2)
## 'data.frame': 9078 obs. of 91 variables:
## $ gRNA_name : chr "AAR2-NRg-3" "AAR2-NRg-4" "AAR2-TRg-15" "AAR2-TRg-16" ...
## $ Seq : chr "CCAGCGATAAGGAGGATCTT" "TGTGTCCTTTCTTCATCTCT" "AAAAGGAAAAAGTAATTAGG" "GTGAAAAGGAAAAAGTAATT" ...
## $ SOURCEPLATEID : chr "R2877.H.023" "R2877.H.024" "R2877.H.023" "R2877.H.023" ...
## $ SOURCECOLONYCOLUMN : int 5 21 22 20 21 18 6 7 9 8 ...
## $ SOURCECOLONYROW : chr "O" "L" "P" "N" ...
## $ GENE : chr "AAR2" "AAR2" "AAR2" "AAR2" ...
## $ Control.gRNA : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Location_1536 : chr "AC9" "W41" "AE43" "AA39" ...
## $ CTRL_GT_RND1_R1 : num 0.00397 0.00586 0.10456 -0.01142 0.0174 ...
## $ CTRL_GT_RND1_R2 : num -0.0086 0.00166 0.07118 0.00256 0.04094 ...
## $ CTRL_GT_RND1_R3 : num -0.000625 -0.043918 -0.009614 0.007969 -0.014018 ...
## $ CTRL_GT_RND2_R1 : num 0.0431 -0.0336 0.0292 0.0381 0.0742 ...
## $ CTRL_GT_RND2_R2 : num -0.0165 0.00744 0.01433 0.06599 -0.02973 ...
## $ CTRL_GT_RND2_R3 : num 0.03266 -0.04468 -0.04455 0.01994 -0.00924 ...
## $ CTRL_GT_RND1_MEAN : num -0.00175 -0.01213 0.05538 -0.0003 0.01477 ...
## $ CTRL_GT_RND2_MEAN : num 0.019756 -0.023599 -0.000341 0.041342 0.011747 ...
## $ CTRL_GT_RND1_SD : num 0.00636 0.02761 0.05871 0.01001 0.02757 ...
## $ CTRL_GT_RND2_SD : num 0.0318 0.0275 0.039 0.0232 0.0551 ...
## $ CTRL_GT_RND1_2_MEAN : num 0.009 -0.0179 0.0275 0.0205 0.0133 ...
## $ CTRL_GT_RND1_2_MED : num 0.00167 -0.01595 0.02176 0.01395 0.00408 ...
## $ CTRL_GT_RND1_2_SD : num 0.0237 0.0254 0.054 0.0278 0.039 ...
## $ AA_GT_RND1_R1 : num -0.0239 0.06 -0.0178 -0.1839 0.5993 ...
## $ AA_GT_RND1_R2 : num -0.0311 0.0763 -0.0145 -0.1293 0.2619 ...
## $ AA_GT_RND1_R3 : num 0.0315 -0.0191 -0.0778 -0.0514 0.3276 ...
## $ AA_GT_RND2_R1 : num 0.0142 -0.1375 0.111 0.0105 0.2021 ...
## $ AA_GT_RND2_R2 : num 0.0383 -0.2266 0.0588 0.0966 0.1831 ...
## $ AA_GT_RND2_R3 : num 0.0265 -0.2298 0.0789 0.0787 0.0114 ...
## $ AA_GT_RND1_MEAN : num -0.00781 0.03905 -0.03671 -0.12151 0.39624 ...
## $ AA_GT_RND2_MEAN : num 0.0263 -0.198 0.0829 0.0619 0.1322 ...
## $ AA_GT_RND1_SD : num 0.0343 0.051 0.0356 0.0666 0.1789 ...
## $ AA_GT_RND2_SD : num 0.0121 0.0524 0.0263 0.0455 0.105 ...
## $ AA_GT_RND1_2_MEAN : num 0.00925 -0.07945 0.02308 -0.02979 0.2642 ...
## $ AA_GT_RND1_2_MED : num 0.0203 -0.0783 0.0221 -0.0205 0.232 ...
## $ AA_GT_RND1_2_SD : num 0.0296 0.1378 0.0712 0.1127 0.1953 ...
## $ LPI_GT_RND1_R1 : num -0.0278 0.0541 -0.1224 -0.1724 0.5819 ...
## $ LPI_GT_RND1_R2 : num -0.0225 0.0746 -0.0857 -0.1318 0.2209 ...
## $ LPI_GT_RND1_R3 : num 0.0322 0.0248 -0.0682 -0.0594 0.3416 ...
## $ LPI_GT_RND2_R1 : num -0.0289 -0.1039 0.0818 -0.0276 0.1279 ...
## $ LPI_GT_RND2_R2 : num 0.0548 -0.2341 0.0444 0.0306 0.2128 ...
## $ LPI_GT_RND2_R3 : num -0.0062 -0.1851 0.1234 0.0588 0.0206 ...
## $ LPI_GT_RND1_MEAN : num -0.00605 0.05119 -0.09208 -0.12121 0.38147 ...
## $ LPI_GT_RND2_MEAN : num 0.00656 -0.17435 0.08321 0.02058 0.12042 ...
## $ LPI_GT_RND1_SD : num 0.0332 0.025 0.0276 0.0573 0.1837 ...
## $ LPI_GT_RND2_SD : num 0.0433 0.0657 0.0395 0.0441 0.0963 ...
## $ LPI_GT_RND1_2_MEAN : num 0.000251 -0.061582 -0.004437 -0.050313 0.250944 ...
## $ LPI_GT_RND1_2_MED : num -0.0143 -0.0396 -0.0119 -0.0435 0.2169 ...
## $ LPI_GT_RND1_2_SD : num 0.0352 0.1313 0.1007 0.0901 0.1941 ...
## $ CTRL_Y_RND1_R1 : num 0.055 0.0573 0.0189 0.0779 -0.1151 ...
## $ CTRL_Y_RND1_R2 : num 0.0131 0.0472 0.0129 0.0562 -0.0723 ...
## $ CTRL_Y_RND1_R3 : num 0.0306 0.0109 -0.0236 0.1208 -0.0171 ...
## $ CTRL_Y_RND2_R1 : num 0.0399 0.0113 -0.1422 0.0605 -0.0676 ...
## $ CTRL_Y_RND2_R2 : num 0.02531 0.0066 -0.17285 -0.00976 -0.08442 ...
## $ CTRL_Y_RND2_R3 : num -0.0528 0.0089 0.00598 0.06854 -0.08083 ...
## $ CTRL_Y_RND1_MEAN : num 0.03288 0.03847 0.00276 0.08497 -0.06817 ...
## $ CTRL_Y_RND2_MEAN : num 0.00414 0.00893 -0.10301 0.03977 -0.07762 ...
## $ CTRL_Y_RND1_SD : num 0.0211 0.0244 0.023 0.0329 0.0491 ...
## $ CTRL_Y_RND2_SD : num 0.04985 0.00234 0.09563 0.04308 0.00885 ...
## $ CTRL_Y_RND1_2_MEAN : num 0.0185 0.0237 -0.0501 0.0624 -0.0729 ...
## $ CTRL_Y_RND1_2_MED : num 0.028 0.0111 -0.0088 0.0645 -0.0766 ...
## $ CTRL_Y_RND1_2_SD : num 0.0377 0.0224 0.085 0.0423 0.032 ...
## $ AA_Y_RND1_R1 : num 0.0672 0.0106 0.1785 0.272 -2.1 ...
## $ AA_Y_RND1_R2 : num -0.3832 0.0102 0.1196 0.232 -1.1873 ...
## $ AA_Y_RND1_R3 : num -0.1599 -0.0465 -0.0381 0.1036 -1.0143 ...
## $ AA_Y_RND2_R1 : num 0.0503 0.3083 -0.2434 0.0968 -0.4223 ...
## $ AA_Y_RND2_R2 : num -0.0505 0.4391 -0.2526 -0.094 -0.304 ...
## $ AA_Y_RND2_R3 : num -0.00342 0.52795 -0.20049 -0.02924 -0.26528 ...
## $ AA_Y_RND1_MEAN : num -0.15864 -0.00857 0.08665 0.20251 -1.43385 ...
## $ AA_Y_RND2_MEAN : num -0.00121 0.42511 -0.23216 -0.00879 -0.33052 ...
## $ AA_Y_RND1_SD : num 0.2252 0.0329 0.112 0.088 0.5834 ...
## $ AA_Y_RND2_SD : num 0.0504 0.1105 0.0278 0.097 0.0818 ...
## $ AA_Y_RND1_2_MEAN : num -0.0799 0.2083 -0.0728 0.0969 -0.8822 ...
## $ AA_Y_RND1_2_MED : num -0.027 0.159 -0.119 0.1 -0.718 ...
## $ AA_Y_RND1_2_SD : num 0.17 0.248 0.189 0.142 0.71 ...
## $ LPI_Y_RND1_R1 : num 0.0122 -0.0467 0.1595 0.1942 -1.9849 ...
## $ LPI_Y_RND1_R2 : num -0.3963 -0.0369 0.1066 0.1757 -1.1149 ...
## $ LPI_Y_RND1_R3 : num -0.1905 -0.0575 -0.0145 -0.0172 -0.9972 ...
## $ LPI_Y_RND2_R1 : num 0.0104 0.297 -0.1012 0.0363 -0.3547 ...
## $ LPI_Y_RND2_R2 : num -0.0758 0.4325 -0.0797 -0.0842 -0.2196 ...
## $ LPI_Y_RND2_R3 : num 0.0494 0.519 -0.2065 -0.0978 -0.1844 ...
## $ LPI_Y_RND1_MEAN : num -0.1915 -0.047 0.0839 0.1175 -1.3657 ...
## $ LPI_Y_RND2_MEAN : num -0.00536 0.41618 -0.12915 -0.04856 -0.2529 ...
## $ LPI_Y_RND1_SD : num 0.2042 0.0103 0.0892 0.1171 0.5395 ...
## $ LPI_Y_RND2_SD : num 0.0641 0.1119 0.0678 0.0738 0.0899 ...
## $ LPI_Y_RND1_2_MEAN : num -0.0984 0.1846 -0.0226 0.0345 -0.8093 ...
## $ LPI_Y_RND1_2_MED : num -0.03273 0.13006 -0.04712 0.00953 -0.67593 ...
## $ LPI_Y_RND1_2_SD : num 0.169 0.263 0.137 0.126 0.701 ...
## $ CTRL_GT_MEAN_RND1_2_SD: num 0.01521 0.00811 0.0394 0.02945 0.00214 ...
## $ AA_GT_MEAN_RND1_2_SD : num 0.0241 0.1676 0.0846 0.1297 0.1867 ...
## $ LPI_GT_MEAN_RND1_2_SD : num 0.00892 0.15948 0.12395 0.10026 0.18459 ...
## $ P_value_M1 : num 0.178 0.594 0.814 0.494 0.33 ...
## $ P.adjusted_M1 : num 0.91 0.911 0.952 0.91 0.91 ...
Extract the CRISPRi-control strains LPI GT data from ROUND1 and ROUND2, respectively and store the output in two different vectors.
CRISPRi_Ctrl_Round1 <- whole_data_CRISPRi_aa_2$LPI_GT[which(whole_data_CRISPRi_aa_2$Control.gRNA == 1
& whole_data_CRISPRi_aa_2$Round_ID=="1st_round")]
summary(CRISPRi_Ctrl_Round1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.06721 0.09108 0.16819 0.15842 0.20944 0.38829
CRISPRi_Ctrl_Round2 <- whole_data_CRISPRi_aa_2$LPI_GT[which(whole_data_CRISPRi_aa_2$Control.gRNA == 1
& whole_data_CRISPRi_aa_2$Round_ID=="2nd_round")]
summary(CRISPRi_Ctrl_Round2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.21609 -0.07291 -0.02904 -0.01828 0.04256 0.15629
P-value is estimated by Welch two sample two-sided t-test (an adaptation of Student’s t-test)
for(i in 1:nrow(Analysis_Final_2)){
test1 <- t(Analysis_Final_2[i, 35:37])
test2 <- t(Analysis_Final_2[i, 38:40])
if(sum(!is.na(test1[, 1]))>=2){
P_value_RND1 <- t.test(CRISPRi_Ctrl_Round1, test1[which(!is.na(test1[, 1]))])
Analysis_Final_2[i, 92] <- P_value_RND1$p.value
} else {
Analysis_Final_2[i, 92] <- NA
}
if(sum(!is.na(test2[, 1]))>=2){
P_value_RND2 <- t.test(CRISPRi_Ctrl_Round2, test2[which(!is.na(test2[, 1]))])
Analysis_Final_2[i, 93] <- P_value_RND2$p.value
} else {
Analysis_Final_2[i, 93] <- NA
}
}
colnames(Analysis_Final_2)[92:93] <- c("P_value_RND1_M2", "P_value_RND2_M2")
P-value adjustment by BENJAMINI-HOCHBERG False Discovery Rate (FDR) method
Analysis_Final_2[which(!is.na(Analysis_Final_2$P_value_RND1_M2)), 94] <- p.adjust(Analysis_Final_2$P_value_RND1_M2[which(!is.na(Analysis_Final_2$P_value_RND1_M2))],
method = "BH",
n = length(Analysis_Final_2$P_value_RND1_M2[which(!is.na(Analysis_Final_2$P_value_RND1_M2))]))
Analysis_Final_2[which(!is.na(Analysis_Final_2$P_value_RND2_M2)), 95] <- p.adjust(Analysis_Final_2$P_value_RND2_M2[which(!is.na(Analysis_Final_2$P_value_RND2_M2))],
method = "BH",
n = length(Analysis_Final_2$P_value_RND2_M2[which(!is.na(Analysis_Final_2$P_value_RND2_M2))]))
colnames(Analysis_Final_2)[94:95] <- c("P.adjusted_RND1_M2", "P.adjusted_RND2_M2")
NUMBER OF SIGNIFICANT STRAINS
length(Analysis_Final_2$P_value_RND1_M2[which(Analysis_Final_2$P_value_RND1_M2<=0.05)])
## [1] 4601
length(Analysis_Final_2$P.adjusted_RND1_M2[which(Analysis_Final_2$P.adjusted_RND1_M2<=0.05)])
## [1] 3389
length(Analysis_Final_2$P_value_RND1_M2[which(Analysis_Final_2$P_value_RND1_M2<=0.1)])
## [1] 5635
length(Analysis_Final_2$P.adjusted_RND1_M2[which(Analysis_Final_2$P.adjusted_RND1_M2<=0.1)])
## [1] 4692
P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS ROUND 1
Figure 4: P-value diagnostic by histogram, Method 2, Round 1
NUMBER OF SIGNIFICANT STRAINS
length(Analysis_Final_2$P_value_RND2_M2[which(Analysis_Final_2$P_value_RND2_M2<=0.05)])
## [1] 2304
length(Analysis_Final_2$P.adjusted_RND2_M2[which(Analysis_Final_2$P.adjusted_RND2_M2<=0.05)])
## [1] 987
length(Analysis_Final_2$P_value_RND2_M2[which(Analysis_Final_2$P_value_RND2_M2<=0.1)])
## [1] 3174
length(Analysis_Final_2$P.adjusted_RND2_M2[which(Analysis_Final_2$P.adjusted_RND2_M2<=0.1)])
## [1] 1431
P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS ROUND 2
Figure 5: P-value diagnostic by histogram, Method 2, Round 2
It is a robust statistical method. However, one of the major problem with this method is setting different thresholds for p.adjusted values and LPI GT Mean for each round.
For METHOD 3, We hypothesized that the difference between the mean(µ) phenotypic performance of a specific CRISPRi strain (StrainX) considering all technical replicates (3 in each) in the two independent experimental rounds (i.e. n=6) to the mean phenotypic performance of all the CRISPRi strains that falls within the interquartile range (IQR) of the complete dataset would be zero, and any difference within the IQR to be just by chance.
Null Hypothesis : µStrainX(All_replicates_LPI_GT)- µ(InterquartileRange_LPI_GT) = 0
Additionally, we tested one final statistical model to determine significance of our observations
For METHOD 4, We hypothesized that the difference between the mean(µ) phenotypic performance of a specific CRISPRi strain (StrainX) considering all technical replicates (3 in each) in the two independent experimental rounds (i.e. n=6) to the mean phenotypic performance of all the CRISPRi control strains (with gRNA targeting no genetic locus in S. cerevisiae) would be zero, and any difference within the CRISPRi control strains phenotypic performance range (LPI GT range) to be just by chance.
Null Hypothesis : µStrainX(All_replicates_LPI_GT) - µCRISPRi_Control_Strains(LPI_GT) = 0
To ensure that we don’t distort the original dataset we clone the Analysis dataset in a new name
Analysis_Final_3 <- Analysis_Final_2
Since we will consider all replicates this time, we will compare it with all replicates (NOT MEAN) that falls within IQR for Method 3. For this purpose, we extract the IQR dataset including all the replicate data for each strain. We will use the data.frame Data_CRISPRi_aa (see, REMOVE ROWS WITH SPATIAL CONTROL STRAIN DATA) to extract this numeric vector.
BOX PLOT - RELATIVE GENERATION TIME (LPI GT)
Figure 6: Boxplot of relative generation time (LPI GT) for all strains including all replicates in the library
Display Box-plot statistics
boxplot_stat_LPI_GT$stats
## [,1]
## [1,] -0.25630118
## [2,] -0.04373846
## [3,] 0.02045212
## [4,] 0.09804938
## [5,] 0.31050530
Therefore, extraction of the data points within IQR
Intermediate_50_M3 <- Data_CRISPRi_aa$LPI_GT[which(Data_CRISPRi_aa$LPI_GT >=-0.04373846
&Data_CRISPRi_aa$LPI_GT<=0.09804938)]
summary(Intermediate_50_M3)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.04374 -0.01004 0.02045 0.02236 0.05325 0.09805
This time we extract all the replicate data (non the mean) of each of the CRISPRi control strains for the Method 4
Crispri_control_M4 <- Data_CRISPRi_aa$LPI_GT[which(Data_CRISPRi_aa$Control.gRNA==1)]
summary(Crispri_control_M4)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.21609 -0.03042 0.06889 0.07007 0.16628 0.38829
We recalculate the above parameter taking all six replicates into account and excluding the missing values. For this purpose we use a if else decision tree. This means we get a LSC GT / LPI GT value if at-least 1 replicate managed to grow at a particular condition. Else it will return a missing value or NA
Additionally, we also create two columns that shows number of replicates of a strain managed to grow in basal condition (n_CTRL) and number of replicates in acetic acid condition (n_LPI)
for(i in 1:nrow(Analysis_Final_3)){
test1 <- t(Analysis_Final_3[i, 9:14])
test2 <- t(Analysis_Final_3[i, 35:40])
x1 <- sum(!is.na(test1[, 1]))
x2 <- sum(!is.na(test2[, 1]))
CTRL_GT_Mean_temp <- mean(test1[which(!is.na(test1[, 1]))])
LPI_GT_Mean_temp <- mean(test2[which(!is.na(test2[, 1]))])
Analysis_Final_3[i, 96] <- CTRL_GT_Mean_temp
Analysis_Final_3[i, 97] <- x1
Analysis_Final_3[i, 98] <- LPI_GT_Mean_temp
Analysis_Final_3[i, 99] <- x2
}
colnames(Analysis_Final_3)[96:99] <- c("CTRL_GT_Mean_all", "n_CTRL", "LPI_GT_Mean_all", "n_LPI")
P-value is estimated by Welch two sample two-sided t-test (an adaptation of Student’s t-test)
for(i in 1:nrow(Analysis_Final_3)){
test <- t(Analysis_Final_3[i, 35:40])
x <- sum(!is.na(test[, 1]))
if(x>2){
P.value_temp_M3 <- t.test(Intermediate_50_M3, test[which(!is.na(test[, 1]))])
P.value_temp_M4 <- t.test(Crispri_control_M4, test[which(!is.na(test[, 1]))])
Analysis_Final_3[i, 100] <- P.value_temp_M3$p.value
Analysis_Final_3[i, 101] <- P.value_temp_M4$p.value
} else {
Analysis_Final_3[i, 100] <- NA
Analysis_Final_3[i, 101] <- NA
}
}
colnames(Analysis_Final_3)[100:101] <- c("P.value_M3", "P.value_M4")
P-value adjustment by BENJAMINI-HOCHBERG False Discovery Rate (FDR) method
Analysis_Final_3[which(!is.na(Analysis_Final_3$P.value_M3)), 102] <- p.adjust(Analysis_Final_3$P.value_M3[which(!is.na(Analysis_Final_3$P.value_M3))],
method = "BH",
n = length(Analysis_Final_3$P.value_M3[which(!is.na(Analysis_Final_3$P.value_M3))]))
Analysis_Final_3[which(!is.na(Analysis_Final_3$P.value_M4)), 103] <- p.adjust(Analysis_Final_3$P.value_M4[which(!is.na(Analysis_Final_3$P.value_M4))],
method = "BH",
n = length(Analysis_Final_3$P.value_M4[which(!is.na(Analysis_Final_3$P.value_M4))]))
colnames(Analysis_Final_3)[102:103] <- c("P.adjusted_M3", "P.adjusted_M4")
NUMBER OF SIGNIFICANT STRAINS
length(Analysis_Final_3$P.value_M3[which(Analysis_Final_3$P.value_M3<=0.05)])
## [1] 2468
length(Analysis_Final_3$P.adjusted_M3[which(Analysis_Final_3$P.adjusted_M3<=0.05)])
## [1] 514
length(Analysis_Final_3$P.value_M3[which(Analysis_Final_3$P.value_M3<=0.1)])
## [1] 3392
length(Analysis_Final_3$P.adjusted_M3[which(Analysis_Final_3$P.adjusted_M3<=0.1)])
## [1] 1258
P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS
Figure 7 (Fig S11 in Manuscript): P-value diagnostic by histogram, Method 3
NUMBER OF SIGNIFICANT STRAINS
length(Analysis_Final_3$P.value_M4[which(Analysis_Final_3$P.value_M4<=0.05)])
## [1] 3663
length(Analysis_Final_3$P.adjusted_M4[which(Analysis_Final_3$P.adjusted_M4<=0.05)])
## [1] 2212
length(Analysis_Final_3$P.value_M4[which(Analysis_Final_3$P.value_M4<=0.1)])
## [1] 4545
length(Analysis_Final_3$P.adjusted_M4[which(Analysis_Final_3$P.adjusted_M4<=0.1)])
## [1] 3306
P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS
Figure 8: P-value diagnostic by histogram, Method 4
P.values generated by Method 3 can be corrected efficiently using the FDR method and after the correction the P.adjusted values have nearly equal distribution, which is indicative of a robust statistical outcome. Therefore Method 3 is a good statistical method for this dataset.
Although Method 4 is effective to identify candidates deviated most from the CRISPRi control means, but the FDR method is less effective on the generated P.value. Therefore, less efficient for the current dataset. Moreover, the CRISPRi control strains for some reason consistently displayed a slower growth under acetic acid compared to the mean of the population. This resulted a bias for method 4 in candidate selection.
Out of the 4 statistical methods evaluated, METHOD 3 was the most promising method to identify the significant candidates. Therefore, for this study we considered the results of statistical Method 3 for further downstream analysis.
Number of strains with Adjusted P-value ≤ 0.1
length(Analysis_Final_3$P.adjusted_M3[which(Analysis_Final_3$P.adjusted_M3 <= 0.1)])
## [1] 1258
To avoid missing potential candidates just because of high variability among the replicates, we keep the adjusted P-value threshold less strict i.e. ≤ 0.1. In addition, we introduce an effect size threshold i.e. the phenotypic performance range of CRISPRi control strains.
max(Analysis_Final_3$LPI_GT_Mean_all[which(Analysis_Final_3$Control.gRNA==1)])
## [1] 0.165662
Therefore, any strain that have an adjusted P-value ≤ 0.1 AND mean LPI GT > 0.165662 will be considered SENSITIVE to acetic acid
min(Analysis_Final_3$LPI_GT_Mean_all[which(Analysis_Final_3$Control.gRNA==1)])
## [1] -0.03680838
Therefore, any strain that have an adjusted P-value ≤ 0.1 AND mean LPI GT < -0.03680838 will be considered TOLERANT to acetic acid
candidate_padj_0.1_FIT_M3 <- which((Analysis_Final_3$LPI_GT_Mean_all < -0.03680838 & Analysis_Final_3$P.adjusted_M3<= 0.1))
length(candidate_padj_0.1_FIT_M3)
## [1] 478
This gives 478 ACETIC ACID TOLERANT strains
Fit_M3_complete <- Analysis_Final_3[candidate_padj_0.1_FIT_M3, ]
Fit_M3_complete <- Fit_M3_complete[order(Fit_M3_complete$LPI_GT_Mean_all, decreasing = FALSE), ]
str(Fit_M3_complete)
## 'data.frame': 478 obs. of 103 variables:
## $ gRNA_name : chr "RPN9-TRg-4" "RGL1-NRg-7" "RPN9-NRg-7" "POP3-NRg-5" ...
## $ Seq : chr "ACCCGCTCCCCGCTTTCATC" "GCTCTTGTTTAGTAGGCGTG" "ACCGGATGAAAGCGGGGAGC" "CAAATATCCGCCCTGGCAAT" ...
## $ SOURCEPLATEID : chr "R2877.H.021" "R2877.H.002" "R2877.H.020" "R2877.H.022" ...
## $ SOURCECOLONYCOLUMN : int 1 3 7 19 4 3 18 7 1 3 ...
## $ SOURCECOLONYROW : chr "B" "P" "N" "C" ...
## $ GENE : chr "RPN9" "RGL1" "RPN9" "POP3" ...
## $ Control.gRNA : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Location_1536 : chr "C1" "AE5" "AA13" "E37" ...
## $ CTRL_GT_RND1_R1 : num 0.07603 -0.05695 0.02247 -0.01067 0.00419 ...
## $ CTRL_GT_RND1_R2 : num 0.08332 -0.06969 -0.02313 0.01964 0.00952 ...
## $ CTRL_GT_RND1_R3 : num 0.16235 -0.00953 0.00817 0.06949 -0.03925 ...
## $ CTRL_GT_RND2_R1 : num -0.0229 -0.01756 0.08764 0.0332 0.00409 ...
## $ CTRL_GT_RND2_R2 : num 0.01055 -0.01944 -0.0114 0.02561 0.00428 ...
## $ CTRL_GT_RND2_R3 : num -0.00635 -0.06164 0.0289 0.03377 -0.01342 ...
## $ CTRL_GT_RND1_MEAN : num 0.10723 -0.04539 0.00251 0.02615 -0.00851 ...
## $ CTRL_GT_RND2_MEAN : num -0.00624 -0.03288 0.03505 0.03086 -0.00168 ...
## $ CTRL_GT_RND1_SD : num 0.0479 0.0317 0.0233 0.0405 0.0267 ...
## $ CTRL_GT_RND2_SD : num 0.01672 0.02492 0.0498 0.00455 0.01016 ...
## $ CTRL_GT_RND1_2_MEAN : num 0.0505 -0.0391 0.0188 0.0285 -0.0051 ...
## $ CTRL_GT_RND1_2_MED : num 0.04329 -0.0382 0.01532 0.0294 0.00414 ...
## $ CTRL_GT_RND1_2_SD : num 0.0699 0.0264 0.0391 0.0259 0.0185 ...
## $ AA_GT_RND1_R1 : num -0.525 -0.44 -0.386 -0.34 -0.263 ...
## $ AA_GT_RND1_R2 : num -0.767 -0.484 -0.488 -0.312 -0.257 ...
## $ AA_GT_RND1_R3 : num -0.621 -0.338 -0.4 -0.375 -0.363 ...
## $ AA_GT_RND2_R1 : num -0.2124 -0.1735 0.0426 -0.107 -0.2219 ...
## $ AA_GT_RND2_R2 : num -0.172 -0.2 -0.12 -0.176 -0.256 ...
## $ AA_GT_RND2_R3 : num -0.1797 -0.2696 -0.1352 -0.0916 -0.1875 ...
## $ AA_GT_RND1_MEAN : num -0.638 -0.421 -0.425 -0.342 -0.294 ...
## $ AA_GT_RND2_MEAN : num -0.1881 -0.2144 -0.0709 -0.1249 -0.2218 ...
## $ AA_GT_RND1_SD : num 0.1214 0.0746 0.055 0.0317 0.0595 ...
## $ AA_GT_RND2_SD : num 0.0214 0.0496 0.0985 0.045 0.0343 ...
## $ AA_GT_RND1_2_MEAN : num -0.413 -0.318 -0.248 -0.234 -0.258 ...
## $ AA_GT_RND1_2_MED : num -0.369 -0.304 -0.261 -0.244 -0.257 ...
## $ AA_GT_RND1_2_SD : num 0.2583 0.1265 0.2066 0.124 0.0588 ...
## $ LPI_GT_RND1_R1 : num -0.602 -0.383 -0.409 -0.329 -0.267 ...
## $ LPI_GT_RND1_R2 : num -0.85 -0.414 -0.465 -0.332 -0.267 ...
## $ LPI_GT_RND1_R3 : num -0.783 -0.329 -0.408 -0.445 -0.324 ...
## $ LPI_GT_RND2_R1 : num -0.1895 -0.156 -0.0451 -0.1402 -0.226 ...
## $ LPI_GT_RND2_R2 : num -0.183 -0.181 -0.109 -0.202 -0.26 ...
## $ LPI_GT_RND2_R3 : num -0.173 -0.208 -0.164 -0.125 -0.174 ...
## $ LPI_GT_RND1_MEAN : num -0.745 -0.375 -0.427 -0.368 -0.286 ...
## $ LPI_GT_RND2_MEAN : num -0.182 -0.181 -0.106 -0.156 -0.22 ...
## $ LPI_GT_RND1_SD : num 0.1285 0.0432 0.0324 0.0661 0.0328 ...
## $ LPI_GT_RND2_SD : num 0.00809 0.02602 0.05954 0.04043 0.04341 ...
## $ LPI_GT_RND1_2_MEAN : num -0.463 -0.278 -0.267 -0.262 -0.253 ...
## $ LPI_GT_RND1_2_MED : num -0.395 -0.268 -0.286 -0.265 -0.263 ...
## $ LPI_GT_RND1_2_SD : num 0.319 0.1109 0.1812 0.1264 0.0497 ...
## $ CTRL_Y_RND1_R1 : num 0.1577 -0.2668 -0.0429 0.2105 0.0193 ...
## $ CTRL_Y_RND1_R2 : num -0.0673 -0.2514 -0.0582 0.1997 0.0202 ...
## $ CTRL_Y_RND1_R3 : num 0.2922 -0.0261 -0.0683 0.1684 0.058 ...
## $ CTRL_Y_RND2_R1 : num 0.0912 -0.0829 -0.0174 0.0119 0.0743 ...
## $ CTRL_Y_RND2_R2 : num -0.19136 -0.09411 -0.00691 0.11114 0.07309 ...
## $ CTRL_Y_RND2_R3 : num 0.295 -0.0282 -0.0197 -0.0138 0.0828 ...
## $ CTRL_Y_RND1_MEAN : num 0.1275 -0.1815 -0.0565 0.1929 0.0325 ...
## $ CTRL_Y_RND2_MEAN : num 0.0649 -0.0684 -0.0147 0.0364 0.0767 ...
## $ CTRL_Y_RND1_SD : num 0.1816 0.1347 0.0128 0.0219 0.0221 ...
## $ CTRL_Y_RND2_SD : num 0.24423 0.03528 0.00683 0.06596 0.00529 ...
## $ CTRL_Y_RND1_2_MEAN : num 0.0962 -0.1249 -0.0356 0.1147 0.0546 ...
## $ CTRL_Y_RND1_2_MED : num 0.1245 -0.0885 -0.0313 0.1398 0.0656 ...
## $ CTRL_Y_RND1_2_SD : num 0.1955 0.1077 0.0247 0.0963 0.0282 ...
## $ AA_Y_RND1_R1 : num 1.696 -0.399 1.145 1.162 0.45 ...
## $ AA_Y_RND1_R2 : num 1.589 -0.333 1.135 1.103 0.45 ...
## $ AA_Y_RND1_R3 : num 1.528 -0.161 1.087 1.2 0.543 ...
## $ AA_Y_RND2_R1 : num 0.644 0.28 0.11 0.187 0.405 ...
## $ AA_Y_RND2_R2 : num 0.279 0.368 0.185 0.18 0.416 ...
## $ AA_Y_RND2_R3 : num 0.461 0.606 0.27 0.147 0.494 ...
## $ AA_Y_RND1_MEAN : num 1.604 -0.298 1.122 1.155 0.481 ...
## $ AA_Y_RND2_MEAN : num 0.461 0.418 0.188 0.171 0.438 ...
## $ AA_Y_RND1_SD : num 0.0849 0.1229 0.0308 0.0486 0.0538 ...
## $ AA_Y_RND2_SD : num 0.1827 0.1685 0.0796 0.021 0.0484 ...
## $ AA_Y_RND1_2_MEAN : num 1.0327 0.0602 0.6551 0.6631 0.4594 ...
## $ AA_Y_RND1_2_MED : num 1.086 0.0597 0.6782 0.645 0.4497 ...
## $ AA_Y_RND1_2_SD : num 0.6389 0.4137 0.5143 0.5399 0.0514 ...
## $ LPI_Y_RND1_R1 : num 1.538 -0.132 1.187 0.952 0.43 ...
## $ LPI_Y_RND1_R2 : num 1.6563 -0.0818 1.1928 0.9034 0.4296 ...
## $ LPI_Y_RND1_R3 : num 1.236 -0.135 1.155 1.031 0.485 ...
## $ LPI_Y_RND2_R1 : num 0.553 0.363 0.128 0.175 0.331 ...
## $ LPI_Y_RND2_R2 : num 0.4699 0.4621 0.1915 0.0684 0.3426 ...
## $ LPI_Y_RND2_R3 : num 0.166 0.634 0.289 0.161 0.411 ...
## $ LPI_Y_RND1_MEAN : num 1.477 -0.116 1.178 0.962 0.448 ...
## $ LPI_Y_RND2_MEAN : num 0.396 0.486 0.203 0.135 0.361 ...
## $ LPI_Y_RND1_SD : num 0.2168 0.0299 0.0203 0.0644 0.0317 ...
## $ LPI_Y_RND2_SD : num 0.2035 0.1371 0.0813 0.0579 0.0432 ...
## $ LPI_Y_RND1_2_MEAN : num 0.937 0.185 0.691 0.548 0.405 ...
## $ LPI_Y_RND1_2_MED : num 0.894 0.141 0.722 0.539 0.42 ...
## $ LPI_Y_RND1_2_SD : num 0.621 0.3419 0.537 0.4564 0.0585 ...
## $ CTRL_GT_MEAN_RND1_2_SD: num 0.08023 0.00885 0.02301 0.00333 0.00483 ...
## $ AA_GT_MEAN_RND1_2_SD : num 0.3179 0.146 0.2502 0.1537 0.0511 ...
## $ LPI_GT_MEAN_RND1_2_SD : num 0.3981 0.1371 0.2272 0.1504 0.0463 ...
## $ P_value_M1 : num 0.3346 0.1987 0.3234 0.228 0.0754 ...
## $ P.adjusted_M1 : num 0.91 0.91 0.91 0.91 0.91 ...
## $ P_value_RND1_M2 : num 5.58e-03 2.47e-04 9.67e-06 2.50e-03 3.55e-05 ...
## $ P_value_RND2_M2 : num 5.25e-17 6.93e-04 1.15e-01 1.47e-02 7.13e-03 ...
## $ P.adjusted_RND1_M2 : num 0.02164 0.002034 0.000126 0.01246 0.000397 ...
## $ P.adjusted_RND2_M2 : num 2.24e-14 1.12e-02 3.06e-01 9.49e-02 5.98e-02 ...
## $ CTRL_GT_Mean_all : num 0.0505 -0.0391 0.0188 0.0285 -0.0051 ...
## $ n_CTRL : int 6 6 6 6 6 6 6 6 6 6 ...
## $ LPI_GT_Mean_all : num -0.463 -0.278 -0.267 -0.262 -0.253 ...
## $ n_LPI : int 6 6 6 6 6 6 6 6 6 6 ...
## [list output truncated]
Gene Description Key file :Gene_List_CRISPRi_lib.csv
whole_Gene_list_Final <- read.csv("COMPILED_DATA/Gene_List_CRISPRi_lib.csv", na.strings = "", stringsAsFactors = FALSE)
rownames(whole_Gene_list_Final) <- whole_Gene_list_Final$LIB_ID
Fit_all_M3 <- data.frame(sort(table(Analysis_Final_3$GENE[candidate_padj_0.1_FIT_M3]), decreasing = TRUE))
y <- as.character(Fit_all_M3$Var1)
x <- whole_Gene_list_Final[y, ]
Fit_all_M3_description <- cbind(Fit_all_M3, x[, -1])
str(Fit_all_M3_description)
## 'data.frame': 370 obs. of 8 variables:
## $ Var1 : Factor w/ 370 levels "PEP7","RPN9",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Freq : int 5 5 5 4 3 3 3 3 3 3 ...
## $ SGD_DB_ID : chr "S000002731" "S000002835" "S000001899" "S000006069" ...
## $ SYS_ID : chr "YDR323C" "YDR427W" "YFR003C" "YPL148C" ...
## $ GENE_SYM : chr "PEP7" "RPN9" "YPI1" "PPT2" ...
## $ NAME : chr "carboxyPEPtidase Y-deficient" "Regulatory Particle Non-ATPase" "Yeast Phosphatase Inhibitor" "Phosphopantetheine:Protein Transferase" ...
## $ PHENOTYPE : chr NA "Non-essential gene; null mutant is sensitive to elevated temperatures, shows cell cycle arrest in metaphase, an"| __truncated__ NA NA ...
## $ DESCRIPTION: chr "Adaptor protein involved in vesicle-mediated vacuolar protein sorting; multivalent adaptor protein; facilitates"| __truncated__ "Non-ATPase regulatory subunit of the 26S proteasome; similar to putative proteasomal subunits in other species;"| __truncated__ "Regulatory subunit of the type I protein phosphatase (PP1) Glc7p; Glc7p participates in the regulation of a var"| __truncated__ "Phosphopantetheine:protein transferase (PPTase); activates mitochondrial acyl carrier protein (Acp1p) by phosph"| __truncated__ ...
nrow(Fit_all_M3_description)
## [1] 370
This gives 370 CRISPRi target genes that induced acetic acid TOLERANCE
super_sen_M3 <- Analysis_Final_3[which(!is.na(Analysis_Final_3$CTRL_GT_Mean_all)
&(Analysis_Final_3$n_LPI<3)
&(
is.na(Analysis_Final_3$LPI_GT_Mean_all)
|(Analysis_Final_3$LPI_GT_Mean_all> 0.165662)
)
), ]
nrow(super_sen_M3)
## [1] 17
This gives 17 ACETIC ACID SUPER SENSITIVE strains
candidate_padj_0.1_SEN_M3 <- which((Analysis_Final_3$LPI_GT_Mean_all > 0.165662 & Analysis_Final_3$P.adjusted_M3<= 0.1))
length(candidate_padj_0.1_SEN_M3)
## [1] 481
This gives 481 ACETIC ACID SENSITIVE strains.
Sen_M3_complete <- rbind(super_sen_M3, Analysis_Final_3[candidate_padj_0.1_SEN_M3, ])
Sen_M3_complete <- Sen_M3_complete[order(Sen_M3_complete$LPI_GT_Mean_all, decreasing = TRUE), ]
nrow(Sen_M3_complete)
## [1] 498
In TOTAL, 481+17 = 498 strains displayed acetic acid SENSITIVITY
Sen_all_M3 <- data.frame(sort(table(c(Analysis_Final_3$GENE[candidate_padj_0.1_SEN_M3], super_sen_M3$GENE)), decreasing = TRUE))
y <- as.character(Sen_all_M3$Var1)
x <- whole_Gene_list_Final[y, ]
Sen_all_M3_description <- cbind(Sen_all_M3, x[, -1])
nrow(Sen_all_M3_description)
## [1] 367
This gives 367 CRISPRi target genes that induced acetic acid SENSITIVITY
Data preparation
Extracting the SGD_ID for the unique genes in Fit_all_M3 (see EXTRACT ALL CRISPRi TARGET GENES THAT INDUCED ACETIC ACID TOLERANCE)
Fit_unique_M3 <- as.character(Fit_all_M3$Var1)
x <- whole_Gene_list_Final[Fit_unique_M3, ]
Fit_unique_M3_SGD_ID <- x$SGD_DB_ID
str(Fit_unique_M3_SGD_ID)
## chr [1:370] "S000002731" "S000002835" "S000001899" "S000006069" ...
Extracting the SGD_ID for the unique genes in Sen_all_M3 (see EXTRACT ALL CRISPRi TARGET GENES THAT INDUCED ACETIC ACID SENSITIVITY)
Sen_unique_M3 <- as.character(Sen_all_M3$Var1)
x <- whole_Gene_list_Final[Sen_unique_M3, ]
Sen_unique_M3_SGD_ID <- x$SGD_DB_ID
str(Sen_unique_M3_SGD_ID)
## chr [1:367] "S000003191" "S000003105" "S000006015" "S000001283" ...
Perform GO analysis with the above gene identifier sets in Saccharomyces genome database link
Here we present the SCAN-O-MATIC data in graph and charts.
INSTALL
Plot some representative growth curves form scan-o-matic.
The growth curve data was generated by running the flatten_curves_2.py script (obtained from Simon Stenberg, Gothenburg University, Sweden and available on request) in the scan-o-matic analysis folder generated within the project folder. The program will then generate a curves_flat.csv file in that analysis folder. For the representative growth curve, we generate this curves_flat.csv for the project that have the growth output of plate number 7 and 8 at Basal and acetic acid condition in the screening Round 1. The file is then renamed as Data_for_Representative_GC_SOM.csv and available in our COMPILED_DATA folder.
Growth_curve_data <- read.csv("COMPILED_DATA/Data_for_Representative_GC_SOM.csv", sep = "\t", header = TRUE)
str(Growth_curve_data)
## 'data.frame': 255 obs. of 6145 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ X0_0_0 : num 21659 21516 21142 20760 20348 ...
## $ X0_0_1 : num 181676 183347 189180 197935 209654 ...
## $ X0_0_2 : num 152110 153658 158302 165449 175688 ...
## $ X0_0_3 : num 153383 155001 160064 167504 177861 ...
## $ X0_0_4 : num 150199 152044 157029 164412 174582 ...
## $ X0_0_5 : num 157769 159504 165478 173945 185550 ...
## $ X0_0_6 : num 157492 159423 165380 173741 185011 ...
## $ X0_0_7 : num 163998 165903 172040 180626 192484 ...
## $ X0_0_8 : num 160719 162364 167986 176187 187369 ...
## $ X0_0_9 : num 158611 160418 165957 173936 184680 ...
## $ X0_0_10 : num 152766 154485 160248 168237 179351 ...
## $ X0_0_11 : num 144377 146154 151480 159292 169859 ...
## $ X0_0_12 : num 155710 157550 163646 172042 183956 ...
## $ X0_0_13 : num 162645 164493 170546 179213 190696 ...
## $ X0_0_14 : num 147321 149000 154870 163220 174498 ...
## $ X0_0_15 : num 157033 158879 164907 173405 185125 ...
## $ X0_0_16 : num 150163 152295 158376 167082 178636 ...
## $ X0_0_17 : num 149176 151136 157494 166112 177624 ...
## $ X0_0_18 : num 143700 145653 150989 158896 169646 ...
## $ X0_0_19 : num 148948 150521 156191 164286 175496 ...
## $ X0_0_20 : num 145147 147187 152912 161073 171831 ...
## $ X0_0_21 : num 142052 143518 149019 156608 167246 ...
## $ X0_0_22 : num 146214 148212 153862 161936 172927 ...
## $ X0_0_23 : num 140830 142385 148023 155925 166785 ...
## $ X0_0_24 : num 134855 136393 141402 148690 158693 ...
## $ X0_0_25 : num 149492 151342 156745 164620 175547 ...
## $ X0_0_26 : num 141515 143191 148426 155987 166452 ...
## $ X0_0_27 : num 143168 145143 150564 158322 168999 ...
## $ X0_0_28 : num 137947 139703 144873 152631 163225 ...
## $ X0_0_29 : num 130303 131934 136797 143906 153586 ...
## $ X0_0_30 : num 140933 142587 147920 155537 165961 ...
## $ X0_0_31 : num 132977 134740 139798 147036 156743 ...
## $ X0_0_32 : num 152496 154112 159580 167378 178165 ...
## $ X0_0_33 : num 149542 151175 156140 163476 173447 ...
## $ X0_0_34 : num 136258 137622 142480 149640 159691 ...
## $ X0_0_35 : num 142896 144491 149719 157530 167855 ...
## $ X0_0_36 : num 146818 148602 153988 161864 172516 ...
## $ X0_0_37 : num 139913 141419 146601 153831 163993 ...
## $ X0_0_38 : num 143241 145000 150354 158056 168380 ...
## $ X0_0_39 : num 143393 145107 150475 158123 168609 ...
## $ X0_0_40 : num 135160 136890 141943 149198 158987 ...
## $ X0_0_41 : num 134769 136296 141351 148585 158703 ...
## $ X0_0_42 : num 144394 145768 150783 157971 167928 ...
## $ X0_0_43 : num 138399 140104 145184 152572 162749 ...
## $ X0_0_44 : num 124673 126352 131224 138081 147141 ...
## $ X0_0_45 : num 114587 115846 120020 125885 134350 ...
## $ X0_0_46 : num 110348 111676 116271 122616 131179 ...
## $ X0_0_47 : num 117456 119363 124589 131727 140869 ...
## $ X0_1_0 : num 22810 23128 23870 24799 25829 ...
## $ X0_1_1 : num 145029 147034 153453 161576 173377 ...
## $ X0_1_2 : num 126534 128386 133739 141434 152168 ...
## $ X0_1_3 : num 139343 141248 147201 155430 167194 ...
## $ X0_1_4 : num 139499 140724 145191 151964 161708 ...
## $ X0_1_5 : num 144551 145739 150218 157163 166766 ...
## $ X0_1_6 : num 143214 144462 149041 155715 165060 ...
## $ X0_1_7 : num 151872 153319 158240 165366 175326 ...
## $ X0_1_8 : num 141396 143111 148785 156579 167330 ...
## $ X0_1_9 : num 160825 162958 168962 177289 188668 ...
## $ X0_1_10 : num 133036 134553 139455 146616 156589 ...
## $ X0_1_11 : num 152913 154811 160957 169438 180925 ...
## $ X0_1_12 : num 143398 144777 150134 157720 168074 ...
## $ X0_1_13 : num 155655 156998 161956 169314 179452 ...
## $ X0_1_14 : num 132847 133928 138343 144997 154503 ...
## $ X0_1_15 : num 137801 139640 145204 153077 163597 ...
## $ X0_1_16 : num 133666 135220 140380 147735 157941 ...
## $ X0_1_17 : num 145284 146605 152138 160181 170900 ...
## $ X0_1_18 : num 133894 135037 139321 146012 155380 ...
## $ X0_1_19 : num 142898 144185 149398 156778 167194 ...
## $ X0_1_20 : num 134633 136386 141744 149450 159842 ...
## $ X0_1_21 : num 141251 142372 147568 154874 165134 ...
## $ X0_1_22 : num 136226 138091 143321 150859 161098 ...
## $ X0_1_23 : num 141314 142956 148132 155448 165359 ...
## $ X0_1_24 : num 129189 130531 135453 142561 152295 ...
## $ X0_1_25 : num 140901 142546 147494 154677 164453 ...
## $ X0_1_26 : num 134427 135692 140321 147041 156181 ...
## $ X0_1_27 : num 138449 140122 145260 152646 162672 ...
## $ X0_1_28 : num 132034 133511 138861 146125 156225 ...
## $ X0_1_29 : num 141238 143038 147991 155146 164859 ...
## $ X0_1_30 : num 132174 133310 138048 144650 153980 ...
## $ X0_1_31 : num 140381 141863 145986 152715 162049 ...
## $ X0_1_32 : num 126915 128482 133721 140874 150809 ...
## $ X0_1_33 : num 139928 141649 146794 154421 164627 ...
## $ X0_1_34 : num 129303 130638 135409 142312 151965 ...
## $ X0_1_35 : num 136006 137780 143515 151541 162142 ...
## $ X0_1_36 : num 124382 126026 131319 138754 148893 ...
## $ X0_1_37 : num 135698 137395 143070 150813 161206 ...
## $ X0_1_38 : num 120918 122818 128164 135497 145464 ...
## $ X0_1_39 : num 135871 137852 143947 152148 163011 ...
## $ X0_1_40 : num 124747 126548 131563 138408 147764 ...
## $ X0_1_41 : num 132137 133926 139524 147177 157657 ...
## $ X0_1_42 : num 137212 138874 144363 152079 162437 ...
## $ X0_1_43 : num 146906 148537 153929 161533 172109 ...
## $ X0_1_44 : num 129890 131522 136845 144248 154354 ...
## $ X0_1_45 : num 135719 137406 142752 150099 160076 ...
## $ X0_1_46 : num 115419 116985 121873 128508 137603 ...
## $ X0_1_47 : num 122226 123630 127756 133947 142124 ...
## $ X0_2_0 : num 201430 203145 208703 217131 228514 ...
## $ X0_2_1 : num 136834 138043 142287 148472 157794 ...
## [list output truncated]
Plate0: Plate7_Basal Plate1: Plate8_Basal Plate2: Plate7_AceticAcid Plate3: Plate8_AceticAcid
Each plate have 1536 colonies i.e. 384 strains x 3 replicates + 384 spatial control.
The FIRST COLUMN is just the Image number and 0 being the first image.
Now there are 1536 * 4 = 6144 more columns after the first column. i.e. each colony data is a column. The naming format is as below;
X[Plate_number][row_number][column_number]
All numbers are starting from zero. Therefore, plate_numbers will be ranging from 0 to 3. Each 1536 plate has 32 rows and 48 column. Therefore the row numbers will be ranging from 0 to 31 and column numbers from 0 to 47.
Now we extract the data of only 4 strains from the entire dataset. i.e. one strain that displayed acetic acid sensitivity, a strain with slight acetic acid tolerance, and finally one control strain. The selected strains and the respective positions are obtained from the raw dataset whole_data_CRISPRi_aa
| Strain Characteristics | Strain name | Plate Number | Location1536 | Colname Basal | Colname acetic |
|---|---|---|---|---|---|
| Acetic acid Tolerant | “POL2-NRg-1” | Plate7 | U4 | X0_20_3 | X2_20_3 |
| Acetic acid sensitive | “RRP15-TRg-4” | Plate7 | E4 | X0_4_3 | X2_4_3 |
| Control strain1 | “CC23” | Plate8 | AE23 | X1_30_22 | X3_30_22 |
Therefore extract the above columns data and also the first column with the image number and save it in a new variable. Then change the column names to the [gRNA]_[condition] format
Growth_curve_data_selected <- Growth_curve_data[, c("X", "X0_20_3", "X2_20_3", "X0_4_3", "X2_4_3", "X1_30_22", "X3_30_22")]
colnames(Growth_curve_data_selected) <- c("Time", "POL2-NRg-1_Basal", "POL2-NRg-1_Acetic", "RRP15-TRg-4_Basal", "RRP15-TRg-4_Acetic", "CC23_Basal", "CC23_Acetic")
str(Growth_curve_data_selected)
## 'data.frame': 255 obs. of 7 variables:
## $ Time : int 0 1 2 3 4 5 6 7 8 9 ...
## $ POL2-NRg-1_Basal : num 113652 114873 119049 124818 132739 ...
## $ POL2-NRg-1_Acetic : num 85018 85018 85057 85317 85615 ...
## $ RRP15-TRg-4_Basal : num 134222 135491 139949 146494 156017 ...
## $ RRP15-TRg-4_Acetic: num 81686 81982 82500 83028 83565 ...
## $ CC23_Basal : num 125292 126829 131175 137350 145429 ...
## $ CC23_Acetic : num 96983 97229 97975 98813 99707 ...
Images are automatically taken 20 minutes apart. Therefore, image number*20/60 will give us the time point in hour. Therefore, We will convert the first column in time point.
Growth_curve_data_selected[, 1] <- Growth_curve_data_selected[, 1]*20/60
Convert the data.frame in long format and save in a new variable
library(reshape)
Growth_curve_data_selected_long <- reshape(data=Growth_curve_data_selected, idvar="Time",
varying = colnames(Growth_curve_data_selected)[2:7],
v.name=c("Population_size"),
new.row.names = 1:30000,
direction="long",
timevar = "gRNA_condition",
times = colnames(Growth_curve_data_selected)[2:7])
str(Growth_curve_data_selected_long)
## 'data.frame': 1530 obs. of 3 variables:
## $ Time : num 0 0.333 0.667 1 1.333 ...
## $ gRNA_condition : chr "POL2-NRg-1_Basal" "POL2-NRg-1_Basal" "POL2-NRg-1_Basal" "POL2-NRg-1_Basal" ...
## $ Population_size: num 113652 114873 119049 124818 132739 ...
## - attr(*, "reshapeLong")=List of 4
## ..$ varying:List of 1
## .. ..$ Population_size: chr [1:6] "POL2-NRg-1_Basal" "POL2-NRg-1_Acetic" "RRP15-TRg-4_Basal" "RRP15-TRg-4_Acetic" ...
## .. ..- attr(*, "v.names")= chr "Population_size"
## .. ..- attr(*, "times")= chr [1:6] "POL2-NRg-1_Basal" "POL2-NRg-1_Acetic" "RRP15-TRg-4_Basal" "RRP15-TRg-4_Acetic" ...
## ..$ v.names: chr "Population_size"
## ..$ idvar : chr "Time"
## ..$ timevar: chr "gRNA_condition"
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Figure 9 (Part of Fig. 1 in Manuscript): Representative growth curves
Scatterplot to display reproducibility of the two scan-o-matic screenings. The mean of the three LPI_GT replicates of each strain is plotted against X and Y axis for round1 and round2, respectively. The data of the CRISPRi control strains are indicated with green dots, acetic acid sensitive strains are indicated with red dots and acetic acid tolerant strains are indicated with blue dots. Data of all other strains are indicated with black dots.
Figure 10 (Fig. 2A in Manuscript): DATA REPRODUCIBILITY
summary(stats_LPI_GT_Mean_RND1vsRND2_M3)
##
## Call:
## lm(formula = LPI_GT_RND2_MEAN ~ LPI_GT_RND1_MEAN, data = Analysis_Final_3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.71004 -0.04877 -0.00146 0.04439 1.05360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.009341 0.001014 -9.215 <2e-16 ***
## LPI_GT_RND1_MEAN 0.274544 0.005532 49.631 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0886 on 8832 degrees of freedom
## (244 observations deleted due to missingness)
## Multiple R-squared: 0.2181, Adjusted R-squared: 0.218
## F-statistic: 2463 on 1 and 8832 DF, p-value: < 2.2e-16
cor(Analysis_Final_3$LPI_GT_RND1_MEAN,
Analysis_Final_3$LPI_GT_RND2_MEAN,
method = "pearson",
use = "complete.obs")
## [1] 0.4669872
The linear regression fitting model (black dashed line) for the data of all strains together gave a co-efficient of determination i.e. R2 = 0.22 and Pearson correlation coefficient r = 0.47
summary(stats_LPI_GT_Mean_RND1vsRND2_M3_selected)
##
## Call:
## lm(formula = LPI_GT_RND2_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] ~
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)],
## data = Analysis_Final_3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.40299 -0.05268 -0.00899 0.03756 0.55375
##
## Coefficients:
## Estimate
## (Intercept) 0.000213
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] 0.581230
## Std. Error
## (Intercept) 0.003588
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] 0.010154
## t value
## (Intercept) 0.059
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] 57.240
## Pr(>|t|)
## (Intercept) 0.953
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] <2e-16
##
## (Intercept)
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09715 on 872 degrees of freedom
## (85 observations deleted due to missingness)
## Multiple R-squared: 0.7898, Adjusted R-squared: 0.7896
## F-statistic: 3276 on 1 and 872 DF, p-value: < 2.2e-16
cor(Analysis_Final_3$LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)],
Analysis_Final_3$LPI_GT_RND2_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)],
method = "pearson",
use = "complete.obs")
## [1] 0.8887079
The linear regression fitting model (red dashed line) of the acetic acid sensitive and tolerant strain’s data gave a R2 value of 0.79 and Pearson correlation coefficient r= 0.89 .
Scatterplot showing the relative generation time of each CRISPRi strains in basal condition in X-axis [Log Strain Co-efficient (LSC) of generation time (GT)] and relative generation time under acetic acid stress condition (150mM Acetic acid) compared to control condition in Y-axis (LPI_GT). Each point indicates the mean of all the replicates (n=6). For some acetic acid sensitive strains (198), the number of replicates are between 3-5 (n=3 for 135; n=4 for 16; n=5 for 47) as not all replicates managed to grow on the acetic acid stress condition. The data of the CRISPRi control strains are indicated with the green dots. Based on our statistical analysis, strains that have FDR adjusted P-values ≤ 0.1 and mean LPI_GT > 0.165 (maximum LPI_GT of CRISPRi control strains) are designated as acetic acid sensitive strains ( represented by red dots). Strains that have FDR adjusted P-values ≤ 0.1 and mean LPI_GT < -0.037 (minimum LPI_GT of CRISPRi control strains) are designated as acetic acid tolerant(blue dots). The LPI_GT threshold is indicated with a gray dashed line. Data of strains that falls outside the adjusted P-value and LPI_GT threshold, are indicated with black dots.
Figure 11 (Fig. 2C in Manuscript): Normalized generation time (LSC GT) of strains in Basal condition vs Relative generation time (LPI GT) of strains in acetic acid condition compared to basal condition
Violin-plots display the spread and the distribution of the LPI GT data for all CRISPRi strains (ALL), and LPI_GT values of CRISPRi control strains
Violin_LPI_Mean_M3 <- data.frame()
R <- length(which((Analysis_Final_3$Control.gRNA==0)
&(!is.na(Analysis_Final_3$LPI_GT_Mean_all))
))
Violin_LPI_Mean_M3[1:R, 1] <- Analysis_Final_3$LPI_GT_Mean_all[which((Analysis_Final_3$Control.gRNA==0)
&(!is.na(Analysis_Final_3$LPI_GT_Mean_all)))]
Violin_LPI_Mean_M3[1:R, 2] <- "ALL"
R2 <- length(which(Analysis_Final_3$Control.gRNA==1))
Violin_LPI_Mean_M3[(R+1):(R+R2), 1] <- Analysis_Final_3$LPI_GT_Mean_all[which(Analysis_Final_3$Control.gRNA==1)]
Violin_LPI_Mean_M3[(R+1):(R+R2), 2] <- "CONTROL"
colnames(Violin_LPI_Mean_M3)[1:2] <- c("Mean", "Label")
Figure 12 (Fig. 2C INSET, in Manuscript): Violin-plots display the spread and the distribution of the LPI GT data
We display gene names that are highly represented within the fit and the sensitive strains, i.e. CRISPRi targeting of these genes by multiple gRNA displayed the tolerant / sensitive phenotype. The CRISPRi repression of a gene vs the obtained phenotype relationship is more reliable for those highly represented genes.
## Loading required package: RColorBrewer
Figure 13: Wordcloud for CRISPRi gene targets of acetic acid tolerant strains
Figure 14: Wordcloud for CRISPRi gene targets of acetic acid sensitive strains
First assigning rownames as the gRNA names in the Analysis_Final_3 data.frame
row.names(Analysis_Final_3) <- Analysis_Final_3$gRNA_name
Next, for this graph we fetch some additional information from a .CSV file available as supplementary in smith et al., 2017. The file is also available in our COMPILED_DATA folder
Supplementary data from smith et al., 2017 : smith_YEPGdata.csv
Smith_Yepg_data <- read.csv("COMPILED_DATA/smith_YEPGdata.csv", na.strings = "")
str(Smith_Yepg_data)
## 'data.frame': 8939 obs. of 28 variables:
## $ ORF : chr "YBL074C" "YBL074C" "YBL074C" "YBL074C" ...
## $ gRNA_targeting_seq : chr "CCAGCGATAAGGAGGATCTT" "TGTGTCCTTTCTTCATCTCT" "AAAAGGAAAAAGTAATTAGG" "GTGAAAAGGAAAAAGTAATT" ...
## $ Midpoint_TSS_dist : int -36 -135 -187 -190 -33 -90 -139 -178 -11 -15 ...
## $ Norm_atac_seq_read_density: num 0.56 0.93 0.62 0.62 0.54 0.55 0.53 0.41 0.28 0.28 ...
## $ Multiple_ORFs_Targeted : int 0 0 0 0 0 0 1 1 0 0 ...
## $ nearby_genes : chr NA NA NA NA ...
## $ gene_name : chr "AAR2" "AAR2" "AAR2" "AAR2" ...
## $ guide_id : chr "AAR2-NRg-3" "AAR2-NRg-4" "AAR2-TRg-15" "AAR2-TRg-16" ...
## $ oligo_seq : chr "GGGAGCTGCGATTGGCAGCCAGCGATAAGGAGGATCTTGTTTTAGAGCTAGAAATAGCAAG" "GGGAGCTGCGATTGGCAGTGTGTCCTTTCTTCATCTCTGTTTTAGAGCTAGAAATAGCAAG" "GGGAGCTGCGATTGGCAGAAAAGGAAAAAGTAATTAGGGTTTTAGAGCTAGAAATAGCAAG" "GGGAGCTGCGATTGGCAGGTGAAAAGGAAAAAGTAATTGTTTTAGAGCTAGAAATAGCAAG" ...
## $ YPEG_._ATC1 : num 16.8 963.5 193.5 153.9 78.3 ...
## $ YPEG_._ATC2 : num 15 1010.4 208.2 136.5 46.2 ...
## $ YPEG_._ATC3 : num 14.3 896.3 268.5 123.7 34.5 ...
## $ YPEG4 : num 55.9 815.5 447.3 276.6 133.9 ...
## $ YPEG5 : num 54 577 427 450 159 ...
## $ YPEG6 : num 60.3 722.8 311.5 312.5 91.6 ...
## $ YPD_._ATC7 : num 28.3 687.2 349.5 258.9 111.6 ...
## $ YPD_._ATC8 : num 29.1 560.2 376.3 199.9 88.7 ...
## $ YPD_._ATC9 : num 54.4 816.7 338.7 227.5 124.8 ...
## $ YPD10 : num 11.5 639 604.1 405.4 121.7 ...
## $ YPD11 : num 46.7 688 528.5 380.2 141.8 ...
## $ YPD12 : num 57 809 479 213 153 ...
## $ Pool : int 2 2 2 2 1 2 1 1 1 1 ...
## $ Log2_YPEG : num -1.884 0.441 -0.823 -1.327 -1.276 ...
## $ Log2_YPD : num -0.043 -0.0491 -0.598 -0.5412 -0.3587 ...
## $ YPEG_filter_25 : int 1 1 1 1 1 1 1 0 1 1 ...
## $ YPD_filter_25 : int 1 1 1 1 1 1 1 0 1 1 ...
## $ ORF_Category : chr "Essential" "Essential" "Essential" "Essential" ...
## $ RNA_structure.Kcal.Mol. : num -60.6 -56.8 -50.5 -55.3 -56.4 -52.7 -54.6 -57.7 -60.7 -54.1 ...
Out of several columns, the most useful for this study will be,
Extract only this four column in the Analysis_Final_3 data.frame
for(i in 1:nrow(Analysis_Final_3)){
x <- which(row.names(Analysis_Final_3)[i]==Smith_Yepg_data$guide_id)
if(length(x)==0){
Analysis_Final_3[i, 104:107] <- NA
} else {
Analysis_Final_3[i, 104:107] <- Smith_Yepg_data[x, 3:6]
}
}
colnames(Analysis_Final_3)[104:107] <- colnames(Smith_Yepg_data)[3:6]
str((Analysis_Final_3)[104:107])
## 'data.frame': 9078 obs. of 4 variables:
## $ Midpoint_TSS_dist : int -36 -135 -187 -190 -33 -90 -139 -178 -11 -15 ...
## $ Norm_atac_seq_read_density: num 0.56 0.93 0.62 0.62 0.54 0.55 0.53 0.41 0.28 0.28 ...
## $ Multiple_ORFs_Targeted : int 0 0 0 0 0 0 1 1 0 0 ...
## $ nearby_genes : chr NA NA NA NA ...
Estimate the gRNA frequency
gRNA_Freq <- data.frame(sort(table(Analysis_Final_3$GENE), decreasing = TRUE))
Plot the graphs
Figure 15: (Figure S7 in Manuscript) Histogram of number of strains per target gene in the CRISPRi library (TOP PANEL). Histogram of gRNA distance from Transcription starting site of the Genes (BOTTOM PANEL)
First the LSC GT mean under acetic acid stress was recalculated for all the strains excluding the missing values
for(i in 1:nrow(Analysis_Final_3)){
test1 <- t(Analysis_Final_3[i, 22:27])
x1 <- sum(!is.na(test1[, 1]))
AA_GT_Mean_temp <- mean(test1[which(!is.na(test1[, 1]))])
Analysis_Final_3[i, 32] <- AA_GT_Mean_temp
}
Plot the histogram
Figure 16 (fig. 2B in manuscript): Histogram to display strains growth in Basal condition (TOP PANEL). Histogram to display strains growth at 150mM of acetic acid (BOTTOM PANEL)
for(i in 1:nrow(whole_Gene_list_Final)){
x <- as.character(unique(Smith_Yepg_data$ORF_Category[(Smith_Yepg_data$gene_name %in% whole_Gene_list_Final$LIB_ID[i])]))
if(length(x)==0){
whole_Gene_list_Final[i, 8] <- NA
} else{
whole_Gene_list_Final[i, 8] <- x
}
}
colnames(whole_Gene_list_Final)[8] <- "ORF_Category"
whole_Gene_list_Final$ORF_Category <- as.factor(whole_Gene_list_Final$ORF_Category)
#Missing values were obtained from SGD
whole_Gene_list_Final$ORF_Category[which(is.na(whole_Gene_list_Final$ORF_Category))] <- c("Respiratory",
"Other",
"Essential",
"Essential",
"Respiratory",
"Other",
"Respiratory",
"Respiratory",
"Other",
"Respiratory",
"Essential",
"Essential",
"Respiratory")
str(whole_Gene_list_Final)
## 'data.frame': 1617 obs. of 8 variables:
## $ LIB_ID : chr "AAR2" "AAT1" "AAT2" "ABD1" ...
## $ SGD_DB_ID : chr "S000000170" "S000001589" "S000004017" "S000000440" ...
## $ SYS_ID : chr "YBL074C" "YKL106W" "YLR027C" "YBR236C" ...
## $ GENE_SYM : chr "AAR2" "AAT1" "AAT2" "ABD1" ...
## $ NAME : chr "A1-Alpha2 Repression" "Aspartate AminoTransferase" "Aspartate AminoTransferase" NA ...
## $ PHENOTYPE : chr "Essential gene; conditional mutant is heat sensitive, loses viability at elevated temperature and displays elev"| __truncated__ "Non-essential gene; null mutant has a reduced respiratory growth rate and decreased competitive fitness on non-"| __truncated__ "Non-essential gene in S288C, but essential in the Sigma1278b background; S288C null mutant displays a decreased"| __truncated__ "Essential gene; temperature-sensitive mutation causes decreasing protein synthesis upon temperature shift; repr"| __truncated__ ...
## $ DESCRIPTION : chr "Component of the U5 snRNP complex; required for splicing of U3 precursors; originally described as a splicing f"| __truncated__ "Mitochondrial aspartate aminotransferase; catalyzes the conversion of oxaloacetate to aspartate in aspartate an"| __truncated__ "Cytosolic aspartate aminotransferase involved in nitrogen metabolism; localizes to peroxisomes in oleate-grown cells" "Methyltransferase; catalyzes the transfer of a methyl group from S-adenosylmethionine to the GpppN terminus of "| __truncated__ ...
## $ ORF_Category: Factor w/ 3 levels "Essential","Other",..: 1 3 3 1 1 3 1 3 1 1 ...
The results from Scan-o-matic phenomics were validated in liquid micro-cultivation growth experiment in bioscreen
Some CRISPRi strains were selcetd for a liquid growth experiment in bioscreen to identify a ATc concentration that can induce similar growth inhibition in YNB liquid media (Basal condition) as we observed in our Quantitative spot test assay on YNB agar media with 7.5 ug/ml of ATc. Here we analyze that data set. These strains were selected based on the competitive growth assay of the CRISPRi library in liquid YPD medium with and without 250 ng/ml of ATc by (Smith et al., 2017).
ATc titration data compiled : ATc_liq_titer_data.csv
Atc_liq_data <- read.csv("COMPILED_DATA/ATc_liq_titer_data.csv", na.strings = "NaN", header = TRUE)
str(Atc_liq_data)
## 'data.frame': 80 obs. of 9 variables:
## $ Well_No : int 9 19 29 39 49 59 69 79 89 99 ...
## $ gRNA_name : chr "ACT1-NRg-5" "ACT1-NRg-5" "ACT1-NRg-5" "ACT1-NRg-5" ...
## $ Atc_concentration: num 0 0.25 1 2 3 5 7.5 10 15 25 ...
## $ Lag_R1 : num 5.95 5.81 5.89 2.87 3.95 ...
## $ GT_R1 : num 1.96 2.03 2.16 3.9 5.06 ...
## $ Yield_R1 : num 2.98 3.17 2.77 1.42 1.03 ...
## $ Lag_R2 : num 5.53 5.69 5.49 3.62 3.97 ...
## $ GT_R2 : num 2 2.02 2.17 4.59 4.88 ...
## $ Yield_R2 : num 2.87 3.02 2.96 1.21 1.05 ...
Note that in liquid experiment, three phenotypes were estimated i.e. growth LAG phase, GENERATION TIME and growth biomass YIELD
Atc_liq_cc23 <- Atc_liq_data[which(Atc_liq_data$gRNA_name=="Ctrl-CC23"), ]
uniq_gRNA <- unique(Atc_liq_data$gRNA_name)
uniq_conc <- unique(Atc_liq_data$Atc_concentration)
Atc_liq_data[, 10:15] <- log(Atc_liq_data[, 4:9])
To determine the normalized growth or Log Strain Co-efficient (LSC) values, we use the data of the control strain CC23. Substracting the log transformed growth phenotypes of CC23 from the log transformed phenotypes of the strains in the respective concentrations of ATc generates the log strain coefficients or LSC values at that condition for each phenotypes.
for(i in 1:length(uniq_gRNA)){
for(j in 1:length(uniq_conc)){
Atc_liq_data[which(Atc_liq_data$gRNA_name==uniq_gRNA[i]&
Atc_liq_data$Atc_concentration==uniq_conc[j]), 16:21] <-
Atc_liq_data[which(Atc_liq_data$gRNA_name==uniq_gRNA[i]&
Atc_liq_data$Atc_concentration==uniq_conc[j]), 10:15] -
Atc_liq_data[which(Atc_liq_data$gRNA_name=="Ctrl-CC23"&
Atc_liq_data$Atc_concentration==uniq_conc[j]), 10:15]
}
}
for(i in 1:nrow(Atc_liq_data)){
Atc_liq_data[i, 22] <- mean(as.numeric(Atc_liq_data[i, c(16, 19)][which(!is.na(Atc_liq_data[i, c(16, 19)]))]))
Atc_liq_data[i, 23] <- sd(as.numeric(Atc_liq_data[i, c(16, 19)][which(!is.na(Atc_liq_data[i, c(16, 19)]))]))
Atc_liq_data[i, 24] <- mean(as.numeric(Atc_liq_data[i, c(17, 20)][which(!is.na(Atc_liq_data[i, c(17, 20)]))]))
Atc_liq_data[i, 25] <- sd(as.numeric(Atc_liq_data[i, c(17, 20)][which(!is.na(Atc_liq_data[i, c(17, 20)]))]))
Atc_liq_data[i, 26] <- mean(as.numeric(Atc_liq_data[i, c(18, 21)][which(!is.na(Atc_liq_data[i, c(18, 21)]))]))
Atc_liq_data[i, 27] <- sd(as.numeric(Atc_liq_data[i, c(18, 21)][which(!is.na(Atc_liq_data[i, c(18, 21)]))]))
}
colnames(Atc_liq_data)[10:15] <- paste0("log_", colnames(Atc_liq_data)[4:9])
colnames(Atc_liq_data)[16:21] <- paste0("LSC_", colnames(Atc_liq_data)[4:9])
colnames(Atc_liq_data)[22:23] <- paste0(c("Mean_", "SD_"), "LSC_Lag")
colnames(Atc_liq_data)[24:25] <- paste0(c("Mean_", "SD_"), "LSC_GT")
colnames(Atc_liq_data)[26:27] <- paste0(c("Mean_", "SD_"), "LSC_Yield")
First making a subset to trim the dataset and include data for only the following gRNA_names ACT1-NRg-5, ACT1-NRg-8, SEC21-NRg-5, VPS1-TRg-1. These gRNA’s previously showed to induce strong CRISPRi mediated repression that ultimately caused lethality or very poor growth. These strains were also used for the ATc titration on YNB agar plates by Qualitative Spot-Test Assay. We also display the performance of another CRISPRi control strain Ctrl-CC11 just to display how it performed compared to other strains.
name_gRNA_atc_titer <- c("ACT1-NRg-5", "ACT1-NRg-8", "Ctrl-CC11", "SEC21-NRg-5", "VPS1-TRg-1")
test <- data.frame()
Atc_titer_subset <- data.frame()
for(i in 1:length(name_gRNA_atc_titer)){
test <- Atc_liq_data[which(Atc_liq_data$gRNA_name==name_gRNA_atc_titer[i]), ]
Atc_titer_subset <- rbind(Atc_titer_subset, test)
}
Figure 17 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Lag phase
Figure 18 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Generation time
Figure 19 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Yield
In order to validate the acetic acid sensitivity or tolerance observed for the CRISPRi strains in the scan-o-matic screening, selected strains were grown in liquid YNB medium using the Bioscreen platform. The 48 most acetic acid sensitive (Initially we attempted to take the 50 most acetic acid sensitive strains but then two strains were eliminated due to their poor growth in basal condition) and 50 most tolerant CRISPRi strains from the scan-o-matic analysis were selected for the validation. Moreover, all CRISPRi strains with gRNAs targeting any of the following 12 genes:RPT4, RPN9, PRE4, MRPL10, MRPL4, SEC27, MIA40, VPS45, PUP3, VMA3, SEC62, COG1, were included making a total of 176 strains that were grown together with 7 control strains in liquid medium.
select_genes <- read.table("COMPILED_DATA/selected_genes.txt", header = FALSE, sep = "\t", as.is = TRUE)
y <- vector(mode = "numeric", length = 0)
for(i in 1:length(select_genes$V1)){
x <- which(Analysis_Final_3$GENE==select_genes$V1[i])
y <- c(y, x)
}
select_strains <- Analysis_Final_3$gRNA_name[y]
50 Most tolerant and sensitive strains from Sacn-O-Matic : bottom_top_50.csv
bot_top_50 <- read.csv("COMPILED_DATA/bottom_top_50.csv", stringsAsFactors = FALSE, header = TRUE)
bot_top_50 <- bot_top_50[-which(bot_top_50$n_CTRL<4 & bot_top_50$CTRL_GT_Mean_all > 0.1), ]
bot_top_50_strains <- bot_top_50$gRNA_name
validation_strains <- as.character(union(bot_top_50_strains, select_strains))
These strains were extracted from the main collection and were arrayed in two 96 well microtiter plate. The plate layout is available in the “/RAW_DATA/BS_VAL_SCR/STRAIN_MAP_VAL_EXP” folder
Plate layout : Plate_layout_Liquid_Growth_Exp.xlsx
The strains were grown in liquid YNB medium (basal condition) and in liquid YNB medium supplemented with 150mM (Experiment number 1-3) or 125mM (Experiment number 4-6) of acetic acid. For each strain, 3 independent replicates were included for each growth condition.
The raw data of bioscreen runs are saved in the BS_VAL_SCR folder within the RAW_DATA folder. The raw files are organized in the following format 20200511_VAL_PLATE[microtiter plate number].[Experiment number]_CTRL_AA[acetic acid concentration in mM]_Trimmed
In each raw file the growth data of the strain in the basal condition is presented in 1-100 well and in the same strain order the growth data under acetic acid stress (at concentration as indicated in the raw file name) is presented in 101-200 wells. For each plate, the the wells can be linked to the strain names (gRNA names) using the Plate layout file as mentioned above in VALIDATION DATA ANALYSIS.
For the ease of analysis, the bioscreen raw data was compiled and saved as a .csv file in the COMPILED_DATA folder
bioscreen compiled data : Validation_Bioscreen_data.csv
Val_data <- read.csv("COMPILED_DATA/Validation_Bioscreen_data.csv", na.strings = "NaN", header = TRUE)
str(Val_data)
## 'data.frame': 200 obs. of 40 variables:
## $ Container.Name : chr "Well 1" "Well 2" "Well 3" "Well 4" ...
## $ gRNA_name : chr "RSM28-TRg-6" "RGL1-NRg-7" "COG1-NRg-3" "COG1-TRg-1" ...
## $ CTRL_GT_Mean_all: chr "0.006053539" "-0.039136252" "-0.04627155" "-0.011279953" ...
## $ LPI_GT_Mean_all : chr "-0.191825576" "-0.278454226" "0.166542866" "0.490160515" ...
## $ Ctrl_Lag_R1 : num 2.27 2.09 2.2 2.24 2.13 ...
## $ Ctrl_GT_R1 : num 2.65 2.57 2.61 2.7 2.59 ...
## $ Ctrl_Yield_R1 : num 1.84 2.04 1.97 2 1.97 ...
## $ AA150_Lag_R1 : num 54.5 40.8 47 47.1 46.7 ...
## $ AA150_GT_R1 : num 20 18.3 17.8 27.2 14.6 ...
## $ AA150_Yield_R1 : num 0.225 0.25 0.294 0.208 0.475 ...
## $ Ctrl_Lag_R2 : num 2.09 1.96 2.03 2.1 1.97 ...
## $ Ctrl_GT_R2 : num 2.94 2.53 2.58 2.68 2.57 ...
## $ Ctrl_Yield_R2 : num 1.86 2.24 2.1 1.96 2.06 ...
## $ AA150_Lag_R2 : num 38.9 34.7 36.5 35.7 34.2 ...
## $ AA150_GT_R2 : num 18.7 10.4 15.7 23.6 18.8 ...
## $ AA150_Yield_R2 : num 0.246 0.654 0.408 0.26 0.4 ...
## $ Ctrl_Lag_R3 : num 2.33 2.36 2.25 2.35 2.19 ...
## $ Ctrl_GT_R3 : num 2.8 2.74 2.66 2.75 2.5 ...
## $ Ctrl_Yield_R3 : num 2.04 2.06 2.19 2.24 2.61 ...
## $ AA150_Lag_R3 : num 60 49.9 54.3 57 49.7 ...
## $ AA150_GT_R3 : num 29.2 10.8 12.6 35.7 14.3 ...
## $ AA150_Yield_R3 : num 0.0891 0.401 0.3597 0.0628 0.5105 ...
## $ Ctrl_Lag_R4 : num 2.28 2.25 2.19 2.25 2.12 ...
## $ Ctrl_GT_R4 : num 2.92 2.75 2.75 2.89 2.6 ...
## $ Ctrl_Yield_R4 : num 2.21 2.03 1.94 1.76 2.38 ...
## $ AA125_Lag_R4 : num 20.2 18.7 21.6 24.8 20.2 ...
## $ AA125_GT_R4 : num 8.83 6.23 7.4 8.65 6.63 ...
## $ AA125_Yield_R4 : num 0.57 1.016 0.729 0.869 1.028 ...
## $ Ctrl_Lag_R5 : num 2.46 2.39 2.52 2.6 2.68 ...
## $ Ctrl_GT_R5 : num 2.66 2.56 2.75 2.8 2.72 ...
## $ Ctrl_Yield_R5 : num 2.2 2.42 1.86 2.02 1.9 ...
## $ AA125_Lag_R5 : num 18.6 16.2 18.7 18.3 17.2 ...
## $ AA125_GT_R5 : num 5.28 4.81 5.05 6.33 5.16 ...
## $ AA125_Yield_R5 : num 1.74 1.9 1.75 1.73 1.85 ...
## $ Ctrl_Lag_R6 : num 2.18 2.21 2.15 2.19 2.2 ...
## $ Ctrl_GT_R6 : num 2.78 2.77 2.73 2.79 2.65 ...
## $ Ctrl_Yield_R6 : num 2.27 2.07 2.07 2.23 2.26 ...
## $ AA125_Lag_R6 : num 20.7 18.2 20.9 22.3 20.3 ...
## $ AA125_GT_R6 : num 5.92 5.36 5.7 7.13 5.48 ...
## $ AA125_Yield_R6 : num 1.4 1.51 1.5 1.33 1.58 ...
The first column of the dataset Val_data is Container.Name which is ranging from “Well 1” to “Well 200”. Therefore, the first 100 rows (Well 1 to well 100) display the data of strains from the microtiter plate 1 and the next 100 wells (101 - 200) are data of strains from the microtiter plate 2
Val_data[, 41:76] <- log(Val_data[, 5:40])
colnames(Val_data)[41:76] <- paste0("log_", colnames(Val_data)[5:40])
We estimated the LSC values for the scan-o-matic data using CC23 strain. Therefore, Here in this bioscreen experiment we employ the same strategy to estimate the LSC values. Now CC23 was in each of the plate (plate 1 and 2) and the replicates (three replicates each for AA conditions and 6 replicates for basal/Ctrl condition). However, it failed to grow in one of the replicate of 150mM of AA in Plate 2. Therefore, in order to have a relative estimate (LSC) we first make an average of the CC23 response for each plate and for each condition and then subtract this response from the response of each strain in that respective plate and condition. This will give a relative estimate or the Logarithmic strain co-efficient for each strain.
Val_data_cc <- Val_data[which(Val_data$gRNA_name=="CC23"), ]
for(j in 1:2){
#Ctrl_Lag
Val_data_cc[j, 77] <- mean(as.numeric(Val_data_cc[j, c(41, 47, 53, 59, 65, 71)][which(!is.na(Val_data_cc[j, c(41, 47, 53, 59, 65, 71)]))]))
#Ctrl_GT
Val_data_cc[j, 78] <- mean(as.numeric(Val_data_cc[j, c(42, 48, 54, 60, 66, 72)][which(!is.na(Val_data_cc[j, c(42, 48, 54, 60, 66, 72)]))]))
#Ctrl_Yield
Val_data_cc[j, 79] <- mean(as.numeric(Val_data_cc[j, c(43, 49, 55, 61, 67, 73)][which(!is.na(Val_data_cc[j, c(43, 49, 55, 61, 67, 73)]))]))
#AA150_Lag
Val_data_cc[j, 80] <- mean(as.numeric(Val_data_cc[j, c(44, 50, 56)][which(!is.na(Val_data_cc[j, c(44, 50, 56)]))]))
#AA150_GT
Val_data_cc[j, 81] <- mean(as.numeric(Val_data_cc[j, c(45, 51, 57)][which(!is.na(Val_data_cc[j, c(45, 51, 57)]))]))
#AA150_Yield
Val_data_cc[j, 82] <- mean(as.numeric(Val_data_cc[j, c(46, 52, 58)][which(!is.na(Val_data_cc[j, c(46, 52, 58)]))]))
#AA125_Lag
Val_data_cc[j, 83] <- mean(as.numeric(Val_data_cc[j, c(62, 68, 74)][which(!is.na(Val_data_cc[j, c(62, 68, 74)]))]))
#AA125_GT
Val_data_cc[j, 84] <- mean(as.numeric(Val_data_cc[j, c(63, 69, 75)][which(!is.na(Val_data_cc[j, c(63, 69, 75)]))]))
#AA125_GT
Val_data_cc[j, 85] <- mean(as.numeric(Val_data_cc[j, c(64, 70, 76)][which(!is.na(Val_data_cc[j, c(64, 70, 76)]))]))
}
colnames(Val_data_cc)[77:79] <- paste0("Mean_Ctrl_", c("Lag", "GT", "Yield"))
colnames(Val_data_cc)[80:82] <- paste0("Mean_AA150_", c("Lag", "GT", "Yield"))
colnames(Val_data_cc)[83:85] <- paste0("Mean_AA125_", c("Lag", "GT", "Yield"))
Now we use CC23 response to calculate LSC values. The mean of the log transformed phenotypes (as calculated above) was determined for plate1 and plate2 and then was deducted from the respective phenotypic response of each strain
i.e. log_Phenotype_Strain - mean_log_Phenotype_CC23
The first 100 rows in Val_data is from plate 1. Therefore, we deduct the mean_log_Phenotype_CC23_Plate1 from this set.
#Replicate1
for(i in 1:100){
Val_data[i, 77:82] <- Val_data[i, 41:46]-Val_data_cc["10", 77:82]
}
#Replicate2
for(i in 1:100){
Val_data[i, 83:88] <- Val_data[i, 47:52]-Val_data_cc["10", 77:82]
}
#Replicate3
for(i in 1:100){
Val_data[i, 89:94] <- Val_data[i, 53:58]-Val_data_cc["10", 77:82]
}
#Replicate1
for(i in 101:200){
Val_data[i, 77:82] <- Val_data[i, 41:46]-Val_data_cc["110", 77:82]
}
#Replicate2
for(i in 101:200){
Val_data[i, 83:88] <- Val_data[i, 47:52]-Val_data_cc["110", 77:82]
}
#Replicate3
for(i in 101:200){
Val_data[i, 89:94] <- Val_data[i, 53:58]-Val_data_cc["110", 77:82]
}
#Replicate1
for(i in 1:100){
Val_data[i, 95:100] <- Val_data[i, 59:64]-Val_data_cc["10", c(77:79, 83:85)]
}
#Replicate2
for(i in 1:100){
Val_data[i, 101:106] <- Val_data[i, 65:70]-Val_data_cc["10", c(77:79, 83:85)]
}
#Replicate3
for(i in 1:100){
Val_data[i, 107:112] <- Val_data[i, 71:76]-Val_data_cc["10", c(77:79, 83:85)]
}
#Replicate1
for(i in 101:200){
Val_data[i, 95:100] <- Val_data[i, 59:64]-Val_data_cc["110", c(77:79, 83:85)]
}
#Replicate2
for(i in 101:200){
Val_data[i, 101:106] <- Val_data[i, 65:70]-Val_data_cc["110", c(77:79, 83:85)]
}
#Replicate3
for(i in 101:200){
Val_data[i, 107:112] <- Val_data[i, 71:76]-Val_data_cc["110", c(77:79, 83:85)]
}
colnames(Val_data)[77:112] <- paste0("LSC_", colnames(Val_data)[5:40])
i.e. for example for replicate 1 LPI_AA150 = LSC_AA150_R1- LSC_CTRL_R1
#Replicate1_LPI_AA150
Val_data[, 113:115] <- Val_data[, 80:82]-Val_data[, 77:79]
colnames(Val_data)[113:115] <- paste0("LPI_", colnames(Val_data)[8:10])
#Replicate2_LPI_AA150
Val_data[, 116:118] <- Val_data[, 86:88]-Val_data[, 83:85]
colnames(Val_data)[116:118] <- paste0("LPI_", colnames(Val_data)[14:16])
#Replicate3_LPI_AA150
Val_data[, 119:121] <- Val_data[, 92:94]-Val_data[, 89:91]
colnames(Val_data)[119:121] <- paste0("LPI_", colnames(Val_data)[20:22])
#Replicate1_LPI_AA125
Val_data[, 122:124] <- Val_data[, 98:100]-Val_data[, 95:97]
colnames(Val_data)[122:124] <- paste0("LPI_", colnames(Val_data)[26:28])
#Replicate2_LPI_AA125
Val_data[, 125:127] <- Val_data[, 104:106]-Val_data[, 101:103]
colnames(Val_data)[125:127] <- paste0("LPI_", colnames(Val_data)[32:34])
#Replicate3_LPI_AA125
Val_data[, 128:130] <- Val_data[, 110:112]-Val_data[, 107:109]
colnames(Val_data)[128:130] <- paste0("LPI_", colnames(Val_data)[38:40])
Now estimating the mean and the standard deviation (sd) of the LPI response for each strains. Some strains did not managed to grow in all three replicates. For those strains mean of the replicates that managed to grow are calculated and sd was estimated. Evidently, when number of replicate for a particular condition is 1, then sd is NA. A separate column was introduced to estimate how many non NA replicates were obtained for each strain in each condition.
for (i in 1:nrow(Val_data)){
#LPI_AA150_Lag
Val_data[i, 131] <- mean(as.numeric(Val_data[i, c(113, 116, 119)][which(!is.na(Val_data[i, c(113, 116, 119)]))]))
Val_data[i, 132] <- sd(as.numeric(Val_data[i, c(113, 116, 119)][which(!is.na(Val_data[i, c(113, 116, 119)]))]))
Val_data[i, 133] <- length(as.numeric(Val_data[i, c(113, 116, 119)][which(!is.na(Val_data[i, c(113, 116, 119)]))]))
#LPI_AA150_GT
Val_data[i, 134] <- mean(as.numeric(Val_data[i, c(114, 117, 120)][which(!is.na(Val_data[i, c(114, 117, 120)]))]))
Val_data[i, 135] <- sd(as.numeric(Val_data[i, c(114, 117, 120)][which(!is.na(Val_data[i, c(114, 117, 120)]))]))
Val_data[i, 136] <- length(as.numeric(Val_data[i, c(114, 117, 120)][which(!is.na(Val_data[i, c(114, 117, 120)]))]))
#LPI_AA150_Yield
Val_data[i, 137] <- mean(as.numeric(Val_data[i, c(115, 118, 121)][which(!is.na(Val_data[i, c(115, 118, 121)]))]))
Val_data[i, 138] <- sd(as.numeric(Val_data[i, c(115, 118, 121)][which(!is.na(Val_data[i, c(115, 118, 121)]))]))
Val_data[i, 139] <- length(as.numeric(Val_data[i, c(115, 118, 121)][which(!is.na(Val_data[i, c(115, 118, 121)]))]))
#LPI_AA125_Lag
Val_data[i, 140] <- mean(as.numeric(Val_data[i, c(122, 125, 128)][which(!is.na(Val_data[i, c(122, 125, 128)]))]))
Val_data[i, 141] <- sd(as.numeric(Val_data[i, c(122, 125, 128)][which(!is.na(Val_data[i, c(122, 125, 128)]))]))
Val_data[i, 142] <- length(as.numeric(Val_data[i, c(122, 125, 128)][which(!is.na(Val_data[i, c(122, 125, 128)]))]))
#LPI_AA125_GT
Val_data[i, 143] <- mean(as.numeric(Val_data[i, c(123, 126, 129)][which(!is.na(Val_data[i, c(123, 126, 129)]))]))
Val_data[i, 144] <- sd(as.numeric(Val_data[i, c(123, 126, 129)][which(!is.na(Val_data[i, c(123, 126, 129)]))]))
Val_data[i, 145] <- length(as.numeric(Val_data[i, c(123, 126, 129)][which(!is.na(Val_data[i, c(123, 126, 129)]))]))
#LPI_AA125_GT
Val_data[i, 146] <- mean(as.numeric(Val_data[i, c(124, 127, 130)][which(!is.na(Val_data[i, c(124, 127, 130)]))]))
Val_data[i, 147] <- sd(as.numeric(Val_data[i, c(124, 127, 130)][which(!is.na(Val_data[i, c(124, 127, 130)]))]))
Val_data[i, 148] <- length(as.numeric(Val_data[i, c(124, 127, 130)][which(!is.na(Val_data[i, c(124, 127, 130)]))]))
}
#Assigning the column names
colnames(Val_data)[131:133] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_Lag")
colnames(Val_data)[134:136] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_GT")
colnames(Val_data)[137:139] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_Yield")
colnames(Val_data)[140:142] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_Lag")
colnames(Val_data)[143:145] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_GT")
colnames(Val_data)[146:148] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_Yield")
The dataset Val_data until now have 200 rows that include data of 176 selected strains, 7 control strains and blank wells. The 7 control strains were also in both microtiter plates. Therefore, for each control strain, six replicates exists. In this part, we eliminate the blank rows, extract the control strain data and generate a mean data row for each of the control strains and then finally, add it to the dataframe with 176 selected strains.
dCtrl_strains <- c("CC23", "CC14", "CC2", "CC28", "CC30", "CC32", "CC34")
dCTRL_rows <- vector(mode = "integer", length = 0)
test <- data.frame()
Val_data_dCTRL <- data.frame()
for(i in 1:length(dCtrl_strains)){
test <- Val_data[which(Val_data$gRNA_name==dCtrl_strains[i]), ]
dCTRL_rows <- c(dCTRL_rows, which(Val_data$gRNA_name==dCtrl_strains[i]))
Val_data_dCTRL <- rbind(Val_data_dCTRL, test)
}
m1 <- vector(mode = "numeric", length = 0)
m2 <- vector(mode = "numeric", length = 0)
test1 <- data.frame()
Val_data_dCTRL_F <- data.frame()
Val_data_dCTRL_F[1:7, 1:3] <- Val_data_dCTRL[c("10", "19", "28", "46", "55", "64", "73"), 2:4]
for (i in 1:length(dCtrl_strains)){
test1 <- Val_data_dCTRL[which(Val_data_dCTRL$gRNA_name==dCtrl_strains[i]), ]
#LPI_AA150_Lag
m1 <- as.numeric(test1[1, c(113, 116, 119)][which(!is.na(test1[1, c(113, 116, 119)]))])
m2 <- as.numeric(test1[2, c(113, 116, 119)][which(!is.na(test1[2, c(113, 116, 119)]))])
Val_data_dCTRL_F[i, 4] <- mean(c(m1, m2))
Val_data_dCTRL_F[i, 5] <- sd(c(m1, m2))
Val_data_dCTRL_F[i, 6] <- length(c(m1, m2))
#LPI_AA150_GT
m1 <- as.numeric(test1[1, c(114, 117, 120)][which(!is.na(test1[1, c(114, 117, 120)]))])
m2 <- as.numeric(test1[2, c(114, 117, 120)][which(!is.na(test1[2, c(114, 117, 120)]))])
Val_data_dCTRL_F[i, 7] <- mean(c(m1, m2))
Val_data_dCTRL_F[i, 8] <- sd(c(m1, m2))
Val_data_dCTRL_F[i, 9] <- length(c(m1, m2))
#LPI_AA150_Yield
m1 <- as.numeric(test1[1, c(115, 118, 121)][which(!is.na(test1[1, c(115, 118, 121)]))])
m2 <- as.numeric(test1[2, c(115, 118, 121)][which(!is.na(test1[2, c(115, 118, 121)]))])
Val_data_dCTRL_F[i, 10] <- mean(c(m1, m2))
Val_data_dCTRL_F[i, 11] <- sd(c(m1, m2))
Val_data_dCTRL_F[i, 12] <- length(c(m1, m2))
#LPI_AA125_Lag
m1 <- as.numeric(test1[1, c(122, 125, 128)][which(!is.na(test1[1, c(122, 125, 128)]))])
m2 <- as.numeric(test1[2, c(122, 125, 128)][which(!is.na(test1[2, c(122, 125, 128)]))])
Val_data_dCTRL_F[i, 13] <- mean(c(m1, m2))
Val_data_dCTRL_F[i, 14] <- sd(c(m1, m2))
Val_data_dCTRL_F[i, 15] <- length(c(m1, m2))
#LPI_AA125_GT
m1 <- as.numeric(test1[1, c(123, 126, 129)][which(!is.na(test1[1, c(123, 126, 129)]))])
m2 <- as.numeric(test1[2, c(123, 126, 129)][which(!is.na(test1[2, c(123, 126, 129)]))])
Val_data_dCTRL_F[i, 16] <- mean(c(m1, m2))
Val_data_dCTRL_F[i, 17] <- sd(c(m1, m2))
Val_data_dCTRL_F[i, 18] <- length(c(m1, m2))
#LPI_AA125_GT
m1 <- as.numeric(test1[1, c(124, 127, 130)][which(!is.na(test1[1, c(124, 127, 130)]))])
m2 <- as.numeric(test1[2, c(124, 127, 130)][which(!is.na(test1[2, c(124, 127, 130)]))])
Val_data_dCTRL_F[i, 19] <- mean(c(m1, m2))
Val_data_dCTRL_F[i, 20] <- sd(c(m1, m2))
Val_data_dCTRL_F[i, 21] <- length(c(m1, m2))
}
colnames(Val_data_dCTRL_F)[4:6] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_Lag")
colnames(Val_data_dCTRL_F)[7:9] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_GT")
colnames(Val_data_dCTRL_F)[10:12] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA150_Yield")
colnames(Val_data_dCTRL_F)[13:15] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_Lag")
colnames(Val_data_dCTRL_F)[16:18] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_GT")
colnames(Val_data_dCTRL_F)[19:21] <- paste0(c("Mean_", "SD_", "N_"), "LPI_AA125_Yield")
We will add this recalculated Mean (from 6 independent replicates), sd, and N (number of replicates managed to grow) for all control strains to the final dataset
Val_data_curated <- Val_data[-dCTRL_rows, ]
Val_data_curated <- Val_data_curated[-which(Val_data_curated$gRNA_name=="BLANK"), ]
Val_data_column_trimmed <- Val_data_curated[, c(2:4, 131:148)]
Validation_LPI_all <- rbind(Val_data_column_trimmed, Val_data_dCTRL_F)
rownames(Validation_LPI_all) <- Validation_LPI_all$gRNA_name
str(Validation_LPI_all)
## 'data.frame': 183 obs. of 21 variables:
## $ gRNA_name : chr "RSM28-TRg-6" "RGL1-NRg-7" "COG1-NRg-3" "COG1-TRg-1" ...
## $ CTRL_GT_Mean_all : chr "0.006053539" "-0.039136252" "-0.04627155" "-0.011279953" ...
## $ LPI_GT_Mean_all : chr "-0.191825576" "-0.278454226" "0.166542866" "0.490160515" ...
## $ Mean_LPI_AA150_Lag : num -0.0724 -0.2242 -0.1437 -0.1664 -0.1687 ...
## $ SD_LPI_AA150_Lag : num 0.171 0.0877 0.147 0.1777 0.1468 ...
## $ N_LPI_AA150_Lag : int 3 3 3 3 3 3 3 0 0 3 ...
## $ Mean_LPI_AA150_GT : num -0.188 -0.679 -0.501 0.088 -0.441 ...
## $ SD_LPI_AA150_GT : num 0.251 0.331 0.187 0.197 0.148 ...
## $ N_LPI_AA150_GT : int 3 3 3 3 3 3 3 0 0 3 ...
## $ Mean_LPI_AA150_Yield: num 0.2935 1.0539 0.9286 0.0906 1.1463 ...
## $ SD_LPI_AA150_Yield : num 0.619 0.435 0.133 0.835 0.124 ...
## $ N_LPI_AA150_Yield : int 3 3 3 3 3 3 3 0 0 3 ...
## $ Mean_LPI_AA125_Lag : num -0.0151 -0.1182 0.022 0.0555 -0.0532 ...
## $ SD_LPI_AA125_Lag : num 0.116 0.116 0.161 0.241 0.221 ...
## $ N_LPI_AA125_Lag : int 3 3 3 3 3 3 3 3 3 3 ...
## $ Mean_LPI_AA125_GT : num 0.1062 -0.0398 0.0342 0.205 0.024 ...
## $ SD_LPI_AA125_GT : num 0.225 0.1 0.195 0.142 0.153 ...
## $ N_LPI_AA125_GT : int 3 3 3 3 3 3 3 3 3 3 ...
## $ Mean_LPI_AA125_Yield: num -0.2491 0.0248 -0.0137 -0.0171 0.0335 ...
## $ SD_LPI_AA125_Yield : num 0.587 0.24 0.473 0.281 0.407 ...
## $ N_LPI_AA125_Yield : int 3 3 3 3 3 3 3 3 3 3 ...
In order to perform statistical analysis to identify strains that showed significant tolerance or sensitivity under acetic acid stress in liquid growth experiment, we will use the Val_data_curated dataset, see at STREAMLINING THE VALIDATION DATASET. This dataset already removed the rows with data of control strains and also all the blank rows.
Val_whole_data_dCTRL <- Val_data[dCTRL_rows, ]
For statistical test, similar to scan-o-matic statistical method 4, we hypothesized that the difference between the mean(µ) phenotypic performance of a specific CRISPRi strain (StrainX) considering all independent experimental replicates (n=3) to the mean phenotypic performance of all the CRISPRi control strains (with gRNA targeting no genetic locus in S. cerevisiae) would be zero, and any difference within the CRISPRi control strains phenotypic performance range (LPI GT range) to be just by chance.
Null Hypothesis : µStrainX(All_replicates_LPI_lag/GT/Yield)- µCRISPRi_Control_Strains(LPI_lag/GT/Yield) = 0
First all replicates of the LPI values of all control strains for lag, GT and Yield, respectively are extracted and saved into new vectors.
Val_dCTRL_lag_125 <- c(as.numeric(Val_whole_data_dCTRL[, 122]), as.numeric(Val_whole_data_dCTRL[, 125]), as.numeric(Val_whole_data_dCTRL[, 128]))
Val_dCTRL_GT_125 <- c(as.numeric(Val_whole_data_dCTRL[, 123]), as.numeric(Val_whole_data_dCTRL[, 126]), as.numeric(Val_whole_data_dCTRL[, 129]))
Val_dCTRL_Yield_125 <- c(as.numeric(Val_whole_data_dCTRL[, 124]), as.numeric(Val_whole_data_dCTRL[, 127]), as.numeric(Val_whole_data_dCTRL[, 130]))
for(i in 1:nrow(Val_data_curated)){
test_lag <- t(Val_data_curated[i, c(122, 125, 128)])
test_GT <- t(Val_data_curated[i, c(123, 126, 129)])
test_Yield <- t(Val_data_curated[i, c(124, 127, 130)])
x1 <- sum(!is.na(test_lag[, 1]))
x2 <- sum(!is.na(test_GT[, 1]))
x3 <- sum(!is.na(test_Yield[, 1]))
if(x1>1){
P.value_lag_125<- t.test(Val_dCTRL_lag_125, test_lag[which(!is.na(test_lag[, 1]))])
Val_data_curated[i, 149] <- P.value_lag_125$p.value
} else {
Val_data_curated[i, 149] <- NA
}
if(x2>1){
P.value_GT_125<- t.test(Val_dCTRL_GT_125, test_GT[which(!is.na(test_GT[, 1]))])
Val_data_curated[i, 150] <- P.value_GT_125$p.value
} else {
Val_data_curated[i, 150] <- NA
}
if(x3>1){
P.value_Yield_125<- t.test(Val_dCTRL_Yield_125, test_Yield[which(!is.na(test_Yield[, 1]))])
Val_data_curated[i, 151] <- P.value_Yield_125$p.value
} else {
Val_data_curated[i, 151] <- NA
}
}
colnames(Val_data_curated)[149:151] <- c("P.value_lag_125", "P.value_GT_125", "P.value_Yield_125")
Val_data_curated[which(!is.na(Val_data_curated$P.value_lag_125)), 152] <- p.adjust(Val_data_curated$P.value_lag_125[which(!is.na(Val_data_curated$P.value_lag_125))],
method = "BH",
n = length(Val_data_curated$P.value_lag_125[which(!is.na(Val_data_curated$P.value_lag_125))]))
Val_data_curated[which(!is.na(Val_data_curated$P.value_GT_125)), 153] <- p.adjust(Val_data_curated$P.value_GT_125[which(!is.na(Val_data_curated$P.value_GT_125))],
method = "BH",
n = length(Val_data_curated$P.value_GT_125[which(!is.na(Val_data_curated$P.value_GT_125))]))
Val_data_curated[which(!is.na(Val_data_curated$P.value_Yield_125)), 154] <- p.adjust(Val_data_curated$P.value_Yield_125[which(!is.na(Val_data_curated$P.value_Yield_125))],
method = "BH",
n = length(Val_data_curated$P.value_Yield_125[which(!is.na(Val_data_curated$P.value_Yield_125))]))
colnames(Val_data_curated)[152:154] <- c("P.adj_lag_125", "P.adj_GT_125", "P.adj_Yield_125")
rownames(Val_data_curated) <- Val_data_curated$gRNA_name
str(Val_data_curated)
## 'data.frame': 176 obs. of 154 variables:
## $ Container.Name : chr "Well 1" "Well 2" "Well 3" "Well 4" ...
## $ gRNA_name : chr "RSM28-TRg-6" "RGL1-NRg-7" "COG1-NRg-3" "COG1-TRg-1" ...
## $ CTRL_GT_Mean_all : chr "0.006053539" "-0.039136252" "-0.04627155" "-0.011279953" ...
## $ LPI_GT_Mean_all : chr "-0.191825576" "-0.278454226" "0.166542866" "0.490160515" ...
## $ Ctrl_Lag_R1 : num 2.27 2.09 2.2 2.24 2.13 ...
## $ Ctrl_GT_R1 : num 2.65 2.57 2.61 2.7 2.59 ...
## $ Ctrl_Yield_R1 : num 1.84 2.04 1.97 2 1.97 ...
## $ AA150_Lag_R1 : num 54.5 40.8 47 47.1 46.7 ...
## $ AA150_GT_R1 : num 20 18.3 17.8 27.2 14.6 ...
## $ AA150_Yield_R1 : num 0.225 0.25 0.294 0.208 0.475 ...
## $ Ctrl_Lag_R2 : num 2.09 1.96 2.03 2.1 1.97 ...
## $ Ctrl_GT_R2 : num 2.94 2.53 2.58 2.68 2.57 ...
## $ Ctrl_Yield_R2 : num 1.86 2.24 2.1 1.96 2.06 ...
## $ AA150_Lag_R2 : num 38.9 34.7 36.5 35.7 34.2 ...
## $ AA150_GT_R2 : num 18.7 10.4 15.7 23.6 18.8 ...
## $ AA150_Yield_R2 : num 0.246 0.654 0.408 0.26 0.4 ...
## $ Ctrl_Lag_R3 : num 2.33 2.36 2.25 2.35 2.19 ...
## $ Ctrl_GT_R3 : num 2.8 2.74 2.66 2.75 2.5 ...
## $ Ctrl_Yield_R3 : num 2.04 2.06 2.19 2.24 2.61 ...
## $ AA150_Lag_R3 : num 60 49.9 54.3 57 49.7 ...
## $ AA150_GT_R3 : num 29.2 10.8 12.6 35.7 14.3 ...
## $ AA150_Yield_R3 : num 0.0891 0.401 0.3597 0.0628 0.5105 ...
## $ Ctrl_Lag_R4 : num 2.28 2.25 2.19 2.25 2.12 ...
## $ Ctrl_GT_R4 : num 2.92 2.75 2.75 2.89 2.6 ...
## $ Ctrl_Yield_R4 : num 2.21 2.03 1.94 1.76 2.38 ...
## $ AA125_Lag_R4 : num 20.2 18.7 21.6 24.8 20.2 ...
## $ AA125_GT_R4 : num 8.83 6.23 7.4 8.65 6.63 ...
## $ AA125_Yield_R4 : num 0.57 1.016 0.729 0.869 1.028 ...
## $ Ctrl_Lag_R5 : num 2.46 2.39 2.52 2.6 2.68 ...
## $ Ctrl_GT_R5 : num 2.66 2.56 2.75 2.8 2.72 ...
## $ Ctrl_Yield_R5 : num 2.2 2.42 1.86 2.02 1.9 ...
## $ AA125_Lag_R5 : num 18.6 16.2 18.7 18.3 17.2 ...
## $ AA125_GT_R5 : num 5.28 4.81 5.05 6.33 5.16 ...
## $ AA125_Yield_R5 : num 1.74 1.9 1.75 1.73 1.85 ...
## $ Ctrl_Lag_R6 : num 2.18 2.21 2.15 2.19 2.2 ...
## $ Ctrl_GT_R6 : num 2.78 2.77 2.73 2.79 2.65 ...
## $ Ctrl_Yield_R6 : num 2.27 2.07 2.07 2.23 2.26 ...
## $ AA125_Lag_R6 : num 20.7 18.2 20.9 22.3 20.3 ...
## $ AA125_GT_R6 : num 5.92 5.36 5.7 7.13 5.48 ...
## $ AA125_Yield_R6 : num 1.4 1.51 1.5 1.33 1.58 ...
## $ log_Ctrl_Lag_R1 : num 0.822 0.739 0.787 0.808 0.757 ...
## $ log_Ctrl_GT_R1 : num 0.973 0.944 0.958 0.994 0.952 ...
## $ log_Ctrl_Yield_R1 : num 0.609 0.712 0.676 0.694 0.677 ...
## $ log_AA150_Lag_R1 : num 4 3.71 3.85 3.85 3.84 ...
## $ log_AA150_GT_R1 : num 3 2.91 2.88 3.3 2.68 ...
## $ log_AA150_Yield_R1 : num -1.49 -1.388 -1.224 -1.57 -0.744 ...
## $ log_Ctrl_Lag_R2 : num 0.739 0.674 0.706 0.74 0.68 ...
## $ log_Ctrl_GT_R2 : num 1.079 0.927 0.948 0.986 0.945 ...
## $ log_Ctrl_Yield_R2 : num 0.618 0.807 0.74 0.674 0.723 ...
## $ log_AA150_Lag_R2 : num 3.66 3.55 3.6 3.57 3.53 ...
## $ log_AA150_GT_R2 : num 2.93 2.34 2.75 3.16 2.94 ...
## $ log_AA150_Yield_R2 : num -1.402 -0.424 -0.897 -1.348 -0.916 ...
## $ log_Ctrl_Lag_R3 : num 0.845 0.861 0.811 0.855 0.784 ...
## $ log_Ctrl_GT_R3 : num 1.031 1.009 0.978 1.012 0.918 ...
## $ log_Ctrl_Yield_R3 : num 0.712 0.724 0.784 0.804 0.959 ...
## $ log_AA150_Lag_R3 : num 4.09 3.91 3.99 4.04 3.91 ...
## $ log_AA150_GT_R3 : num 3.38 2.38 2.53 3.58 2.66 ...
## $ log_AA150_Yield_R3 : num -2.418 -0.914 -1.022 -2.768 -0.672 ...
## $ log_Ctrl_Lag_R4 : num 0.826 0.809 0.786 0.813 0.751 ...
## $ log_Ctrl_GT_R4 : num 1.07 1.011 1.01 1.061 0.954 ...
## $ log_Ctrl_Yield_R4 : num 0.793 0.708 0.665 0.566 0.867 ...
## $ log_AA125_Lag_R4 : num 3 2.93 3.07 3.21 3.01 ...
## $ log_AA125_GT_R4 : num 2.18 1.83 2 2.16 1.89 ...
## $ log_AA125_Yield_R4 : num -0.5618 0.0163 -0.3155 -0.1408 0.0275 ...
## $ log_Ctrl_Lag_R5 : num 0.899 0.873 0.925 0.957 0.985 ...
## $ log_Ctrl_GT_R5 : num 0.977 0.938 1.011 1.031 1 ...
## $ log_Ctrl_Yield_R5 : num 0.79 0.885 0.622 0.702 0.644 ...
## $ log_AA125_Lag_R5 : num 2.92 2.79 2.93 2.9 2.84 ...
## $ log_AA125_GT_R5 : num 1.66 1.57 1.62 1.84 1.64 ...
## $ log_AA125_Yield_R5 : num 0.552 0.642 0.561 0.548 0.614 ...
## $ log_Ctrl_Lag_R6 : num 0.78 0.793 0.764 0.785 0.787 ...
## $ log_Ctrl_GT_R6 : num 1.021 1.017 1.003 1.026 0.973 ...
## $ log_Ctrl_Yield_R6 : num 0.822 0.729 0.73 0.802 0.815 ...
## $ log_AA125_Lag_R6 : num 3.03 2.9 3.04 3.1 3.01 ...
## $ log_AA125_GT_R6 : num 1.78 1.68 1.74 1.96 1.7 ...
## $ log_AA125_Yield_R6 : num 0.34 0.412 0.403 0.284 0.459 ...
## $ LSC_Ctrl_Lag_R1 : num -0.0207 -0.1037 -0.0553 -0.0345 -0.0849 ...
## $ LSC_Ctrl_GT_R1 : num -0.00259 -0.03199 -0.01737 0.01855 -0.02336 ...
## $ LSC_Ctrl_Yield_R1 : num -0.16 -0.0577 -0.0929 -0.0757 -0.0925 ...
## $ LSC_AA150_Lag_R1 : num -0.0332 -0.3228 -0.1814 -0.1797 -0.1873 ...
## $ LSC_AA150_GT_R1 : num -0.239 -0.329 -0.358 0.0647 -0.557 ...
## $ LSC_AA150_Yield_R1 : num 0.451 0.553 0.717 0.371 1.197 ...
## $ LSC_Ctrl_Lag_R2 : num -0.104 -0.169 -0.136 -0.103 -0.162 ...
## $ LSC_Ctrl_GT_R2 : num 0.103 -0.049 -0.0273 0.0108 -0.0307 ...
## $ LSC_Ctrl_Yield_R2 : num -0.1511 0.0373 -0.029 -0.0954 -0.0465 ...
## $ LSC_AA150_Lag_R2 : num -0.369 -0.483 -0.435 -0.456 -0.499 ...
## $ LSC_AA150_GT_R2 : num -0.3087 -0.8989 -0.4824 -0.0736 -0.3005 ...
## $ LSC_AA150_Yield_R2 : num 0.539 1.517 1.044 0.593 1.024 ...
## $ LSC_Ctrl_Lag_R3 : num 0.00263 0.01828 -0.03106 0.01217 -0.05865 ...
## $ LSC_Ctrl_GT_R3 : num 0.0557 0.0339 0.0029 0.0364 -0.0577 ...
## $ LSC_Ctrl_Yield_R3 : num -0.0573 -0.045 0.0147 0.0352 0.1893 ...
## $ LSC_AA150_Lag_R3 : num 0.0633 -0.1209 -0.0374 0.012 -0.1254 ...
## $ LSC_AA150_GT_R3 : num 0.139 -0.855 -0.703 0.339 -0.577 ...
## $ LSC_AA150_Yield_R3 : num -0.478 1.027 0.918 -0.828 1.268 ...
## $ LSC_Ctrl_Lag_R4 : num -0.0167 -0.033 -0.0569 -0.0293 -0.0917 ...
## $ LSC_Ctrl_GT_R4 : num 0.0949 0.035 0.0344 0.0852 -0.0211 ...
## $ LSC_Ctrl_Yield_R4 : num 0.0236 -0.0611 -0.1048 -0.2036 0.0974 ...
## $ LSC_AA125_Lag_R4 : num -0.00492 -0.08097 0.06473 0.20324 -0.00257 ...
## $ LSC_AA125_GT_R4 : num 0.458 0.11 0.281 0.437 0.172 ...
## [list output truncated]
For correlation analysis, we need to extract the scan-o-matic data for the strains selected for validation experiment
validation_strains_data <- Analysis_Final_3[validation_strains, ]
dCTRL_data_scan_o_matic <- Analysis_Final_3[(Analysis_Final_3$gRNA_name %in% dCtrl_strains), ]
validation_strains_data <- rbind(validation_strains_data, dCTRL_data_scan_o_matic)
#setting the row order similar to that of the bioscreen dataset Validation_LPI_all
validation_strains_data <- validation_strains_data[rownames(Validation_LPI_all), ]
Now making a new data frame by extracting only the required column from both data.frame i.e. the data.frame that has the scan-o-matic analysis for the validation strains (validation_strains_data) and the data.frame that has the data of the bioscreen liquid growth experiment (Validation_LPI_all)
Validation_new_df <- validation_strains_data[, c(1:8, 96:97, 87, 35:40, 98:99, 89, 100, 102, 104:107, 58, 60, 74:79, 84, 86)]
Validation_new_df <- cbind(Validation_new_df, Validation_LPI_all[, 4:21])
We will add a new column with an identifier that indicates the 50 most acetic acid tolerant and acetic acid sensitive strains in this list. We already had a dataframe with the list of this strains i.e. bot_top_50
bot50 <- bot_top_50[1:48, ]
Top50 <- bot_top_50[49:98, ]
We add a separate column to categorize the strains used in the validation experiment
#dCTRL strains (1)
Validation_new_df[(Validation_new_df$gRNA_name %in% dCtrl_strains), 55] <- 1
#top 50 acetic acid tolerant strains (2),
Validation_new_df[(Validation_new_df$gRNA_name %in% Top50$gRNA_name), 55] <- 2
#most 50 acetic acid sensitive strains (3),
Validation_new_df[(Validation_new_df$gRNA_name %in% bot50$gRNA_name), 55] <- 3
#Other candidates (4)
Validation_new_df[which(is.na(Validation_new_df$V55)), 55] <- 4
#Changing the column name
colnames(Validation_new_df)[55] <- "Strain_category"
str(Validation_new_df)
## 'data.frame': 183 obs. of 55 variables:
## $ gRNA_name : chr "RSM28-TRg-6" "RGL1-NRg-7" "COG1-NRg-3" "COG1-TRg-1" ...
## $ Seq : chr "GGAATTAAACTTAACGAAAC" "GCTCTTGTTTAGTAGGCGTG" "GACATAAGCATTCGTATAAT" "CATTCGTACAACAAATCTTG" ...
## $ SOURCEPLATEID : chr "R2877.H.002" "R2877.H.002" "R2877.H.001" "R2877.H.001" ...
## $ SOURCECOLONYCOLUMN : int 10 3 20 17 17 9 8 8 5 12 ...
## $ SOURCECOLONYROW : chr "E" "P" "L" "I" ...
## $ GENE : chr "RSM28" "RGL1" "COG1" "COG1" ...
## $ Control.gRNA : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Location_1536 : chr "I19" "AE5" "W39" "Q33" ...
## $ CTRL_GT_Mean_all : num 0.00605 -0.03914 -0.04627 -0.01128 -0.03901 ...
## $ n_CTRL : int 6 6 6 6 6 6 6 6 6 6 ...
## $ CTRL_GT_MEAN_RND1_2_SD : num 0.01124 0.00885 0.066 0.00894 0.0067 ...
## $ LPI_GT_RND1_R1 : num -0.2983 -0.383 0.2091 0.5628 0.0449 ...
## $ LPI_GT_RND1_R2 : num -0.35 -0.414 0.137 0.449 0.148 ...
## $ LPI_GT_RND1_R3 : num -0.254 -0.329 0.141 0.55 0.142 ...
## $ LPI_GT_RND2_R1 : num -0.0849 -0.156 0.2351 0.462 0.0213 ...
## $ LPI_GT_RND2_R2 : num -0.0374 -0.1805 0.1606 0.3855 -0.0465 ...
## $ LPI_GT_RND2_R3 : num -0.126 -0.208 0.116 0.532 0.089 ...
## $ LPI_GT_Mean_all : num -0.1918 -0.2785 0.1665 0.4902 0.0665 ...
## $ n_LPI : int 6 6 6 6 6 6 6 3 6 6 ...
## $ LPI_GT_MEAN_RND1_2_SD : num 0.15436 0.13712 0.00591 0.043 0.06396 ...
## $ P.value_M3 : num 8.93e-03 1.16e-03 5.99e-04 1.47e-05 2.09e-01 ...
## $ P.adjusted_M3 : num 0.08284 0.03436 0.02674 0.00739 0.39482 ...
## $ Midpoint_TSS_dist : int -70 -134 -173 -34 -132 -163 -160 -181 -127 -246 ...
## $ Norm_atac_seq_read_density: num 0.66 0.19 0.56 0.79 0.31 0.46 0.17 0.87 0.28 0.45 ...
## $ Multiple_ORFs_Targeted : int 0 1 0 0 0 0 0 0 0 0 ...
## $ nearby_genes : chr NA "YPL067C|-25" NA NA ...
## $ CTRL_Y_RND1_2_MEAN : num -0.0142 -0.1249 -0.0559 -0.3699 -0.1031 ...
## $ CTRL_Y_RND1_2_SD : num 0.0176 0.1077 0.0252 0.0714 0.0506 ...
## $ LPI_Y_RND1_R1 : num 0.479 -0.132 -0.173 -0.465 0.215 ...
## $ LPI_Y_RND1_R2 : num 0.5793 -0.0818 -0.112 -0.5196 0.151 ...
## $ LPI_Y_RND1_R3 : num 0.49 -0.135 -0.28 -0.492 0.168 ...
## $ LPI_Y_RND2_R1 : num -0.00748 0.3632 -0.19109 -0.43542 -0.12262 ...
## $ LPI_Y_RND2_R2 : num 0.0807 0.4621 -0.118 -0.5155 0.0336 ...
## $ LPI_Y_RND2_R3 : num 0.0817 0.6341 -0.0195 -0.3559 0.0466 ...
## $ LPI_Y_RND1_2_MEAN : num 0.2839 0.1851 -0.149 -0.4639 0.0818 ...
## $ LPI_Y_RND1_2_SD : num 0.2588 0.3419 0.0879 0.0617 0.1226 ...
## $ Mean_LPI_AA150_Lag : num -0.0724 -0.2242 -0.1437 -0.1664 -0.1687 ...
## $ SD_LPI_AA150_Lag : num 0.171 0.0877 0.147 0.1777 0.1468 ...
## $ N_LPI_AA150_Lag : int 3 3 3 3 3 3 3 0 0 3 ...
## $ Mean_LPI_AA150_GT : num -0.188 -0.679 -0.501 0.088 -0.441 ...
## $ SD_LPI_AA150_GT : num 0.251 0.331 0.187 0.197 0.148 ...
## $ N_LPI_AA150_GT : int 3 3 3 3 3 3 3 0 0 3 ...
## $ Mean_LPI_AA150_Yield : num 0.2935 1.0539 0.9286 0.0906 1.1463 ...
## $ SD_LPI_AA150_Yield : num 0.619 0.435 0.133 0.835 0.124 ...
## $ N_LPI_AA150_Yield : int 3 3 3 3 3 3 3 0 0 3 ...
## $ Mean_LPI_AA125_Lag : num -0.0151 -0.1182 0.022 0.0555 -0.0532 ...
## $ SD_LPI_AA125_Lag : num 0.116 0.116 0.161 0.241 0.221 ...
## $ N_LPI_AA125_Lag : int 3 3 3 3 3 3 3 3 3 3 ...
## $ Mean_LPI_AA125_GT : num 0.1062 -0.0398 0.0342 0.205 0.024 ...
## $ SD_LPI_AA125_GT : num 0.225 0.1 0.195 0.142 0.153 ...
## $ N_LPI_AA125_GT : int 3 3 3 3 3 3 3 3 3 3 ...
## $ Mean_LPI_AA125_Yield : num -0.2491 0.0248 -0.0137 -0.0171 0.0335 ...
## $ SD_LPI_AA125_Yield : num 0.587 0.24 0.473 0.281 0.407 ...
## $ N_LPI_AA125_Yield : int 3 3 3 3 3 3 3 3 3 3 ...
## $ Strain_category : num 2 2 4 4 4 4 4 4 4 4 ...
Figure 20 (fig. 3 in manuscript): Scatterplot of the relative performance of the strains in liquid medium with 125mM of acetic acid and in solid medium with 150 mM acetic acid (scan-o-matic screening). The linear regression of the data is displayed with a black line. The mean of the three LPI GT replicates of each strain is plotted, control strains in green, acetic acid sensitive strains in red, acetic acid tolerant strains in blue and remaining strains in black. The names of the genes repressed in the tolerant or sensitive strains are indicated in the plot
##
## Call:
## lm(formula = Validation_new_df$Mean_LPI_AA125_GT ~ Validation_new_df$LPI_GT_Mean_all)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.35706 -0.10087 -0.00064 0.10143 0.71447
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.08035 0.02173 3.697 3e-04 ***
## Validation_new_df$LPI_GT_Mean_all 0.82652 0.03971 20.814 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2574 on 159 degrees of freedom
## (22 observations deleted due to missingness)
## Multiple R-squared: 0.7315, Adjusted R-squared: 0.7298
## F-statistic: 433.2 on 1 and 159 DF, p-value: < 2.2e-16
print("At 125mM Acetic Acid")
## [1] "At 125mM Acetic Acid"
print("lag vs GT")
## [1] "lag vs GT"
cor(Validation_new_df$Mean_LPI_AA125_Lag,
Validation_new_df$Mean_LPI_AA125_GT,
method = "pearson",
use = "complete.obs")
## [1] 0.5682123
print("lag vs Yield")
## [1] "lag vs Yield"
cor(Validation_new_df$Mean_LPI_AA125_Lag,
Validation_new_df$Mean_LPI_AA125_Yield,
method = "pearson",
use = "complete.obs")
## [1] -0.6062123
print("GT vs Yield")
## [1] "GT vs Yield"
cor(Validation_new_df$Mean_LPI_AA125_GT,
Validation_new_df$Mean_LPI_AA125_Yield,
method = "pearson",
use = "complete.obs")
## [1] -0.9102591
print("At 150mM Acetic Acid")
## [1] "At 150mM Acetic Acid"
print("lag vs GT")
## [1] "lag vs GT"
cor(Validation_new_df$Mean_LPI_AA150_Lag,
Validation_new_df$Mean_LPI_AA150_GT,
method = "pearson",
use = "complete.obs")
## [1] 0.1103463
print("lag vs Yield")
## [1] "lag vs Yield"
cor(Validation_new_df$Mean_LPI_AA150_Lag,
Validation_new_df$Mean_LPI_AA150_Yield,
method = "pearson",
use = "complete.obs")
## [1] -0.4132699
print("GT vs Yield")
## [1] "GT vs Yield"
cor(Validation_new_df$Mean_LPI_AA150_GT,
Validation_new_df$Mean_LPI_AA150_Yield,
method = "pearson",
use = "complete.obs")
## [1] -0.8419203
Visualizing the mean of the replicates of the selected strains
#Make a color palette
colfunc5<-colorRampPalette(c("goldenrod4", "goldenrod", "white", "turquoise", "turquoise4"))
plot(rep(1,100), col=colfunc5(100), pch=19,cex=2)
The function colfunc5 when called as colfunc5(100) will create a color pallate of hundred colors where white is the mid point. Therefore, the break argument reqires a numerical vector in increasing order of length 100+1. The range of the break vector should be in such a way that all strains with LPI_ value less than -0.02 gets a shade of goldenrod i.e. the deeper the shade of goldenrod, the more acetic acid tolerant it is. Moreover, color range should be equally distributed. Hence, first we created a vector of 49 eliments with equally distributed numbers between -0.5 to -0.05
brk1 <- c(seq(-0.5, -0.05, length.out = 48))
Then between -0.04 to 0.04
brk2 <- c(seq(-0.04, 0.04, length.out = 5))
Finally all strains with LPI value greater than 0.02 will have a shade of turquoise. For that we create another numerical vector of 49 numbers equally distributed starting from 0.07 to 2. The deeper the shade more AA sensitive the strain is. This sensitive range is much larger than the fitness window. That is why the distribution space is also larger. Therefore, between 0.05 to 2
brk3 <- c(seq(0.05, 2, length.out = 48))
Combining we have a numerical vector of length 101 to be used for the break argument
brk_F <- c(brk1, brk2, brk3)
Arranging the rows in decreasing order in terms of their mean phenotypic response (LPI) in generation time under acetc acid condition in the scan-o-matic experiment
Validation_new_df <- Validation_new_df[order(Validation_new_df$LPI_GT_Mean_all, decreasing = TRUE), ]
Therefore, the list generated should have the most AA sensitive strains at the beginning. However, the most AA sensitive strains did not grow in AA condition. Due to missing values(NA) they are positioned in the last 14 rows. We switch them to the front of the list
Validation_new_df <- Validation_new_df[c(170:183, 1:169), ]
Note: The LPI_yield values is having an inverse profile with generation time (GT). LPI_Yield is positive for acetic acid tolerant strain i.e. yield is higher than the control strains whereas GT is negative as GT is lower than the control strain. Therefore, we multiplied -1 with the mean LPI_Yield values of the bioscreen output to avoid confusion in color profile and made two separate columns in the data.frame
Validation_new_df[, 56] <- Validation_new_df[, 43]*(-1)
Validation_new_df[, 57] <- Validation_new_df[, 52]*(-1)
colnames(Validation_new_df)[56:57] <- c("Mean_LPI_AA150_Yield(-1)", "Mean_LPI_AA125_Yield(-1)")
Now plotting the heat map including the following columns. The index of the columns as in Validation_new_df is given in bracket
Figure 21 (fig. S2 in manuscript): Heatmap displaying the relative performance of 183 strains grown in liquid media. Column A and B show the mean LSC or LPI (n=6) of these strains, based on the solid media Scan-o-matic experiments, and columns C-H the mean LPI (n=3) of the strains based on growth in liquid media
Validation data with functional groups : Validation_new_df_with_Groups_by_GO.csv
Validation_new_df_Grp <- read.csv("COMPILED_DATA/Validation_new_df_with_Groups_by_GO.csv", stringsAsFactors = FALSE, na.strings = )
rownames(Validation_new_df_Grp) <- Validation_new_df_Grp$gRNA_name
INSTALL : factoextra
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
#Plotting PCA for only Proteasomal genes and control strains
Functional_group2 <- c("GO:0005839", "GO:0008540", "GO:0008541", "dCTRL")
dataset_pca_125mM_Proteasome <- na.omit(Validation_new_df_Grp[which(Validation_new_df_Grp$Group_BY_GO_Terms %in% Functional_group2), c(46, 49, 52, 55, 6, 58)])
res.pca_Proteasome <- prcomp(dataset_pca_125mM_Proteasome[, 1:3], scale = TRUE)
## [1] "GO:0005839 = proteasome core complex"
## [1] "GO:0008540 = proteasome regulatory particle, base subcomplex"
## [1] "GO:0008541 = proteasome regulatory particle, lid subcomplex"
## [1] "dCTRL = Control strains"
Figure 22: PCA plot for only Proteasomal genes and control strains
Gene_set1_Proteasome <- c("RPN8", "RPN9", "RPN12", "RPT1", "RPT2", "RPT4", "PRE4", "PUP3")
dCTRL_GENES <- c("Ctrl_14", "Ctrl_2", "Ctrl_23", "Ctrl_28", "Ctrl_30", "Ctrl_32", "Ctrl_34")
barplot_dataset7 <- Validation_new_df_Grp[which(Validation_new_df_Grp$GENE %in% c(Gene_set1_Proteasome, dCTRL_GENES)), c(1, 49, 50)]
colnames(barplot_dataset7)[2:3] <- c("LPI_GT", "SD_GT")
library(reshape)
reshape_barplot_dataset7 <- reshape(data=barplot_dataset7, idvar="gRNA_name",
varying = list(colnames(barplot_dataset7)[2], colnames(barplot_dataset7)[3]),
v.name=c("Mean", "SD"),
times = c("LPI_GT"),
new.row.names = 1:10000,
direction="long")
Figure 23: (Fig7A in manuscript) Barplot of relative generation time in liquid medium of CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base) and the control strains
test <- Val_data_curated[as.character(Validation_new_df_Grp[which(Validation_new_df_Grp$GENE %in% Gene_set1_Proteasome), 1]), c(2, 150)]
print("P-value ≤ 0.5")
## [1] "P-value ≤ 0.5"
test[which(test$P.value_GT_125<=0.05), ]
## gRNA_name P.value_GT_125
## PRE4-NRg-9 PRE4-NRg-9 1.707167e-05
## PRE4-NRg-4 PRE4-NRg-4 3.710185e-08
## PUP3-TRg-6 PUP3-TRg-6 5.746297e-03
## PUP3-TRg-5 PUP3-TRg-5 3.157717e-03
## PUP3-TRg-10 PUP3-TRg-10 1.365917e-02
## RPN12-TRg-6 RPN12-TRg-6 3.882935e-03
## RPN9-NRg-4 RPN9-NRg-4 5.275561e-03
## RPN9-NRg-3 RPN9-NRg-3 1.297081e-02
## RPN9-TRg-1 RPN9-TRg-1 4.770957e-02
## RPN9-NRg-7 RPN9-NRg-7 4.100466e-02
## RPT4-TRg-1 RPT4-TRg-1 3.933973e-02
## RPT4-NRg-2 RPT4-NRg-2 2.772957e-03
Val_data_lsc_mean <- Val_data_curated[, 2:4]
for(i in 1:nrow(Val_data_curated)){
Val_data_lsc_mean[i, 4] <- mean(na.omit(as.numeric(Val_data_curated[i, c(78, 84, 90, 96, 102, 108)])))
Val_data_lsc_mean[i, 5] <- sd(na.omit(as.numeric(Val_data_curated[i, c(78, 84, 90, 96, 102, 108)])))
}
row.names(Val_data_lsc_mean) <- Val_data_lsc_mean$gRNA_name
barplot_dataset8 <- Val_data_lsc_mean[as.character(barplot_dataset7$gRNA_name), c(1, 4:5)]
barplot_dataset8 <- barplot_dataset8[-c(1:7), ]
colnames(barplot_dataset8)[2:3] <- c("LSC_GT", "SD_GT")
library(reshape)
reshape_barplot_dataset8 <- reshape(data=barplot_dataset8, idvar="gRNA_name",
varying = list(colnames(barplot_dataset8)[2], colnames(barplot_dataset8)[3]),
v.name=c("Mean", "SD"),
times = c("LSC_GT"),
new.row.names = 1:10000,
direction="long")
Figure 24: Barplot of normalized generation time in liquid medium of CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base)
LID_strains <- Validation_new_df_Grp$gRNA_name[which(Validation_new_df_Grp$Group_BY_GO_Terms %in% c("GO:0008541"))]
BASE_strains <- Validation_new_df_Grp$gRNA_name[which(Validation_new_df_Grp$Group_BY_GO_Terms %in% c("GO:0008540"))]
CP_strains <- Validation_new_df_Grp$gRNA_name[which(Validation_new_df_Grp$Group_BY_GO_Terms %in% c("GO:0005839"))]
LID_strains_sig <- LID_strains[c(1, 6:9)]
BASE_strains_sig <- c("RPT4-NRg-2")
CP_strains_sig <- CP_strains[c(1:2, 5:7)]
Figure 25: (Fig7B in manuscript) Boxplot of relative growth yield in liquid medium with the data of all significantly acetic acid tolerant CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base) and the control strains
P_val_dCTRL_LID_Yield <- t.test(as.numeric(as.matrix(Val_whole_data_dCTRL[, c(124, 127, 130)])),
as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% LID_strains_sig), c(124, 127, 130)])))
P_val_dCTRL_LID_Yield$p.value
## [1] 0.0001320927
P_val_dCTRL_BASE_Yield <- t.test(as.numeric(as.matrix(Val_whole_data_dCTRL[, c(124, 127, 130)])),
as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% BASE_strains_sig), c(124, 127, 130)])))
P_val_dCTRL_BASE_Yield$p.value
## [1] 0.08016981
P_val_dCTRL_CP_Yield <- t.test(as.numeric(as.matrix(Val_whole_data_dCTRL[, c(124, 127, 130)])),
as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% CP_strains_sig), c(124, 127, 130)])))
P_val_dCTRL_CP_Yield$p.value
## [1] 0.001141934
Figure 26: (Fig7C in manuscript) Boxplot of relative lag phase in liquid medium with the data of all significantly acetic acid tolerant CRISPRi strains with gRNAs targeting genes encoding proteasomal subunits (20S CP; core particle, 19S lid or 19S base) and the control strains
P_val_dCTRL_LID_Lag <- t.test(as.matrix(Val_whole_data_dCTRL[, c(122, 125, 128)]),
as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% LID_strains_sig), c(122, 125, 128)])))
P_val_dCTRL_LID_Lag$p.value
## [1] 0.6740559
P_val_dCTRL_BASE_Lag <- t.test(as.matrix(Val_whole_data_dCTRL[, c(122, 125, 128)]),
as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% BASE_strains_sig), c(122, 125, 128)])))
P_val_dCTRL_BASE_Lag$p.value
## [1] 0.6862921
P_val_dCTRL_CP_Lag <- t.test(as.matrix(Val_whole_data_dCTRL[, c(122, 125, 128)]),
as.numeric(as.matrix(Val_data_curated[which(Val_data_curated$gRNA_name %in% CP_strains_sig), c(122, 125, 128)])))
P_val_dCTRL_CP_Lag$p.value
## [1] 0.05837092